Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
In this position paper we will address challenges and opportunities relating to the design and codesign of application specific circuits. Given our background as computational scientists, our perspective is from the viewpoint of a highly motivated application developer as opposed to career computer architects
MLIR (Multi-Level Intermediate Representation), is an extensible compiler framework that supports high-level data structures and operation constructs. These higher-level code representations are particularly applicable to the artificial intelligence and machine learning (AI/ML) domain, allowing developers to more easily support upcoming heterogeneous AI/ML accelerators and develop flexible domain specific compilers/frameworks with higher-level intermediate representations (IRs) and advanced compiler optimizations. The result of using MLIR within the LLVM compiler framework is expected to yield significant improvement in the quality of generated machine code, which in turn will result in improved performance and hardware efficiency
Abstract not provided.
2021 IEEE High Performance Extreme Computing Conference, HPEC 2021
In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.
This report details work to study trade-offs in topology and network bandwidth for potential interconnects in the exascale (2021-2022) timeframe. The work was done using multiple interconnect models across two parallel discrete event simulators. Results from each independent simulator are shown and discussed and the areas of agreement and disagreement are explored.
Abstract not provided.
Abstract not provided.
Abstract not provided.