Publications
Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance
Lifflander, Jonathan; Slattengren, Nicole S.; Pébaÿ, Philippe P.; Miller, Phil; Rizzi, Francesco N.; Bettencourt, Matthew T.
This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or 'taskbased', programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascalelevel computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a nextgeneration plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.