Publications
Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition
Lewis, Cannada L.; Phipps, Eric T.
In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.