Publications
TEMPI: An Interposed MPI Library with Canonical Representation of MPI Datatypes [Poster]
Pearson, Carl W.; Wu, Kun W.; Chung, I-Hsin C.; Xiong, Jinjun X.; Hwu, Wen-mei H.
TEMPI provides a transparent non-contiguous data-handling layer compatible with various MPIs. MPI Datatypes are a powerful abstraction for allowing an MPI implementation to operate on non-contiguous data. CUDA-aware MPI implementations must also manage transfer of such data between the host system and GPU. The non-unique and recursive nature of MPI datatypes mean that providing fast GPU handling is a challenge. The same noncontiguous pattern may be described in a variety of ways, all of which should be treated equivalently by an implementation. This work introduces a novel technique to do this for strided datatypes. Methods for transferring non-contiguous data between the CPU and GPU depends on the properties of the data layout. This work shows that a simple performance model can accurately select the fastest method. Unfortunately, the combination of MPI software and system hardware available may not provide sufficient performance. The contributions of this work are deployed on OLCF Summit through an interposer library which does not require privileged access to the system to use