TEMPI: An Interposed MPI Library with Canonical Representation of MPI Datatypes [Slides]
These points are covered in this presentation: Distributed GPU stencil, non-contiguous data; Equivalence of strided datatypes and minimal representation; GPU communication methods; Deploying on managed systems; Large messages and MPI datatypes; Translation and canonicalization; Automatic model-driven transfer method selection; and Interposed library implementation.