Publications Search

Schonbein, William W.; Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hemmert, Karl S.; Foulk, James W.; Underwood, Keith; Riesen, Rolf; Hoefler, Torsten; Barbe, Mathieu; Suraty Filho, Luiz H.; Ratchov, Alexandre; Maccabe, Arthur B.

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2022

DOI OSTI

SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime

Olivier, Stephen L.; Brightwell, Ronald B.; Dosanjh, Matthew G.; Ferreira, Kurt; Levy, Scott L.N.; Foulk, James W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2022

OSTI

SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime

Olivier, Stephen L.; Brightwell, Ronald B.; Ferreira, Kurt; Grant, Ryan; Levy, Scott L.N.; Foulk, James W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

ALAMO: Autonomous lightweight allocation, management, and optimization

Communications in Computer and Information Science

Brightwell, Ronald B.; Ferreira, Kurt; Grant, Ryan; Levy, Scott L.N.; Lofstead, Gerald F.; Olivier, Stephen L.; Foulk, James W.; Younge, Andrew J.; Gentile, Ann C.; Foulk, James W.

Several recent workshops conducted by the DOE Advanced Scientific Computing Research program have established the fact that the complexity of developing applications and executing them on high-performance computing (HPC) systems is rising at a rate which will make it nearly impossible to continue to achieve higher levels of performance and scalability. Absent an alternative approach to managing this ever-growing complexity, HPC systems will become increasingly difficult to use. A more holistic approach to designing and developing applications and managing system resources is required. This paper outlines a research strategy for managing the increasing the complexity by providing the programming environment, software stack, and hardware capabilities needed for autonomous resource management of HPC systems. Developing portable applications for a variety of HPC systems of varying scale requires a paradigm shift from the current approach, where applications are painstakingly mapped to individual machine resources, to an approach where machine resources are automatically mapped and optimized to applications as they execute. Achieving such automated resource management for HPC systems is a daunting challenge that requires significant sustained investment in exploring new approaches and novel capabilities in software and hardware that span the spectrum from programming systems to device-level mechanisms. This paper provides an overview of the functionality needed to enable autonomous resource management and optimization and describes the components currently being explored at Sandia National Laboratories to help support this capability.

More Details

TYPE Conference Poster YEAR 2021

OSTI Scopus

HPC Operating SystemResearch Areas and Challenges

Foulk, James W.; Brightwell, Ronald B.; Younge, Andrew J.; Lange, Jack

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

The Hardware of Smaller Clusters (V.3.0)

Lacy, Susan W.; Brightwell, Ronald B.

Chris Saunders and three technologists are in high demand from Sandia’s deep learning teams, and they’re kept busy by building new clusters of computer nodes for researchers who need the power of supercomputing on a smaller scale. Sandia researchers working on Laboratory Directed Research & Development (LDRD) projects, or innovative ideas for solutions on short timeframes, formulate new ideas on old themes and frequently rely on smaller cluster machines to help solve problems before introducing their code to larger HPC resources. These research teams need an agile hardware and software environment where nascent ideas can be tested and cultivated on a smaller scale.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

Chronicles of astra: Challenges and lessons from the first petascale arm supercomputer

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Foulk, James W.; Younge, Andrew J.; Hammond, Simon; Foulk, James W.; Curry, Matthew; Aguilar, Michael J.; Hoekstra, Robert J.; Brightwell, Ronald B.

Arm processors have been explored in HPC for several years, however there has not yet been a demonstration of viability for supporting large-scale production workloads. In this paper, we offer a retrospective on the process of bringing up Astra, the first Petascale supercomputer based on 64-bit Arm processors, and validating its ability to run production HPC applications. Through this process several immature technology gaps were addressed, including software stack enablement, Linux bugs at scale, thermal management issues, power management capabilities, and advanced container support. From this experience, several lessons learned are formulated that contributed to the successful deployment of Astra. These insights can be helpful to accelerate deploying and maturing other first-seen HPC technologies. With Astra now supporting many users running a diverse set of production applications at multi-thousand node scales, we believe this constitutes strong supporting evidence that Arm is a viable technology for even the largest-scale supercomputer deployments.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

SNL ATDM Software Technologies. ECP Capability Assessment Report for Software Technologies

Oldfield, Ron; Wolf, Michael; Brightwell, Ronald B.

The Exascale Computing Project (ECP) Capability Assessment Report for Software Technologies at Sandia National Laboratories is provided. The projects are now aggregated to include Kokkos, Kokkos Kernels, VTK-m Operating Systems, and On-Node Runtime efforts. Key challenges and solution strategies are presented for each.

More Details

TYPE Other Report YEAR 2020

DOI OSTI

September 2019 ECP ST Project Review

Trujillo, Gabrielle; Turner, D.Z.; Brightwell, Ronald B.; Oldfield, Ron; Clay, Robert L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Meeting the Future Needs of HPC with MPI

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Opportunities and Challenges for Accelerated Network Interfaces in HPC

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Thoughts on Autonomous Resource Management for HPC

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Memory Technology Impacts on Current Near-Term and Future Systems

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Meeting the Future Needs of HPC with MPI

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Finepoints: Partitioned multithreaded MPI communication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Grant, Ryan; Dosanjh, Matthew G.; Levenhagen, Michael; Brightwell, Ronald B.; Skjellum, Anthony

The MPI multithreading model has been historically difficult to optimize; the interface that it provides for threads was designed as a process-level interface. This model has led to implementations that treat function calls as critical regions and protect them with locks to avoid race conditions. We hypothesize that an interface designed specifically for threads can provide superior performance than current approaches and even outperform single-threaded MPI. In this paper, we describe a design for partitioned communication in MPI that we call finepoints. First, we assess the existing communication models for MPI two-sided communication and then introduce finepoints as a hybrid of MPI models that has the best features of each existing MPI communication model. In addition, “partitioned communication” created with finepoints leverages new network hardware features that cannot be exploited with current MPI point-to-point semantics, making this new approach both innovative and useful both now and in the future. To demonstrate the validity of our hypothesis, we implement a finepoints library and show improvements against a state-of-the-art multithreaded optimized Open MPI implementation on a Cray XC40 with an Aries network. Our experiments demonstrate upÂ to a 12 × reduction in wait time for completion of send operations. This new model is shown working on a nuclear reactor physics neutron-transport proxy-application, providing upÂ to 26.1% improvement in communication time and upÂ to 4.8% improvement in runtime over the best performing MPI communication mode, single-threaded MPI.

More Details

TYPE Conference Poster YEAR 2019

DOI OSTI Scopus

SNL ATDM Software Ecosystem

Olivier, Stephen L.; Brightwell, Ronald B.; Foulk, James W.; Younge, Andrew J.; Evans, Noah; Levy, Scott L.N.; Ferreira, Kurt; Grant, Ryan

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

The Portals 4.2 Network Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle; Riesen, Rolf; Hoefler, Torsten; Maccabe, Arthur B.; Hudson, Trammell

This report presents a specification for the Portals 4 network programming interface. Portals 4 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded systems. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2018

DOI OSTI

Hardware/Software Co-Design for High Performance Interconnects for Extreme-Scale Systems

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Vanguard Astra: Maturing the ARM Software Ecosystem for U.S. DOE/ASC Supercomputing

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

System Software Perspective on Resilience

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Improving MPI Multi-threaded RMA Performance

Hjelm, Nathan; Dosanjh, Matthew G.; Groves, Taylor; Grant, Ryan; Brightwell, Ronald B.; Bridges, Patrick; Arnold, Dorian

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

Portals 4: Status of Specification and Implementation

Younge, Andrew J.; Grant, Ryan; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Architectural Convergence of Big Data and Extreme-Scale Computing: Marriage of Convenience or Conviction

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Resource Management Challenges in the Era of Extreme Heterogeneity

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

Younge, Andrew J.; Foulk, James W.; Grant, Ryan; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

DOI OSTI

Enhancing Qthreads for ECP Science and Energy Impact

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

ATDM Operating Systems and On-Node Runtime

Olivier, Stephen L.; Foulk, James W.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

Proceedings of the International Conference on Cloud Computing Technology and Science, CloudCom

Younge, Andrew J.; Foulk, James W.; Grant, Ryan; Brightwell, Ronald B.

Containerization, or OS-level virtualization has taken root within the computing industry. However, container utilization and its impact on performance and functionality within High Performance Computing (HPC) is still relatively undefined. This paper investigates the use of containers with advanced supercomputing and HPC system software. With this, we define a model for parallel MPI application DevOps and deployment using containers to enhance development effort and provide container portability from laptop to clouds or supercomputers. In this endeavor, we extend the use of Sin- gularity containers to a Cray XC-series supercomputer. We use the HPCG and IMB benchmarks to investigate potential points of overhead and scalability with containers on a Cray XC30 testbed system. Furthermore, we also deploy the same containers with Docker on Amazon's Elastic Compute Cloud (EC2), and compare against our Cray supercomputer testbed. Our results indicate that Singularity containers operate at native performance when dynamically linking Cray's MPI libraries on a Cray supercomputer testbed, and that while Amazon EC2 may be useful for initial DevOps and testing, scaling HPC applications better fits supercomputing resources like a Cray.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

December 2017 ECP ST Project Review: ECP Project WBS 2.3.5.04 (SNL ATDM Software Ecosystem)

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Other Report YEAR 2017

DOI OSTI

December 2017 ECP ST Project Review: ECP Project WBS 2.3.1.15 (Qthreads)

Brightwell, Ronald B.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Younge, Andrew J.; Foulk, James W.; Grant, Ryan; Gaines, Brian; Brightwell, Ronald B.

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed computing models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging software ecosystems.In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifically, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, effectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus

Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters

Younge, Andrew J.; Foulk, James W.; Grant, Ryan; Gaines, Brian; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

What Will Determine the Future Success of MPI?

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Sandia's ARM-centric Co-Design Strategy: Introduction to the NNSA/ASC Vanguard Project

Ang, James A.; Brightwell, Ronald B.; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Foulk, James W.; Foulk, James W.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

HPC Co-Design

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Preparing MPI for Exascale

Grant, Ryan; Dosanjh, Matthew G.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Challenges and Opportunities for HPC Interconnects and MPI

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Embracing Diversity: OS Support for Integrating High- Performance Computing and Data Analytics

Brightwell, Ronald B.; Foulk, James W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

The Portals 4.1 Network Programming Interface

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan; Hemmert, Karl S.; Foulk, James W.; Wheeler, Kyle; Underwood, Keith D.; Riesen, Rolf; Maccabe, Arthur B.; Hudson, Trammel

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Qthreads and On-Node Run time Coordination

Olivier, Stephen L.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

sPIN: High-performance streaming Processing in the Network

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Hoefler, Torsten; Di Girolamo, Salvatore; Taranov, Konstantin; Grant, Ryan; Brightwell, Ronald B.

Optimizing communication performance is imperative for largescale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card. To demonstrate the potential of the model, we design a cycle-accurate simulation environment by combining the network simulator LogGOPSim and the CPU simulator gem5. We implement offloaded message matching, datatype processing, and collective communications and demonstrate transparent full-application speedups. Furthermore, we show how sPIN can be used to accelerate redundant in-memory filesystems and several other use cases. Our work investigates a portable packet-processing network acceleration model similar to compute acceleration with CUDA or OpenCL. We show how such network acceleration enables an eco-system that can significantly speed up applications and system services.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus