Publications Search

The Portals 4.1 Network Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hemmert, Karl S.; Pedretti, Kevin P.; Wheeler, Kyle W.; Underwood, Keith; Riesen, Rolf R.; Maccabe, Arthur B.; Hudson, Trammel H.

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.

More Details

TYPE SAND Report YEAR 2017

OSTI DOI

An architecture to perform NIC based MPI matching

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Hemmert, Karl S.; Underwood, Keith; Rodrigues, Arun

Modern supercomputers aggregate thousands of microprocessors through a high performance network. Many of these systems place a processor on the network interface controller (NIC) to handle some portion of the MPI processing. This processing involves traversing a linked list and invoking a matching function for each item. Although this task is critical to the performance of the system, microprocessors perform it extremely poorly. Furthermore, the traditional network processor approaches of multicore and multithreading map poorly to the problem because the list is a shared data structure. While match processing can be implemented directly in hardware, hardware implementations can be extremely inflexible and lead to extremely high risk. This paper presents a novel, programmable architecture for a processor to handle the matching function. The matching engine approaches the performance of a direct hardware implementation while maintaining a high degree of flexibility and programmability. More importantly, it requires a dramatically smaller area than a conventional processor. © 2007 IEEE.

More Details

TYPE Conference YEAR 2007

Scopus OSTI

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC'07

Underwood, Keith; Levenhagen, Michael J.; Brightwell, Ronald B.

Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark. (c) 2007 ACM.

More Details

TYPE Conference YEAR 2007

Scopus OSTI

Approaching the Exascale? Most aren't even ready for the petascale

Underwood, Keith

Abstract not provided.