Publications Search

The high performance computing industry is undergoing a period of substantial change. Not least because of fabrication and lithographic challenges in the manufacturing of next-generation processors. As such challenges mount, the industry is looking to generate higher performance from additional functionality in the micro-architecture space as well as a greater emphasis on efficiency in the design of networkon-chip resources and memory subsystems. Such variation in design opens opportunities for new entrants in the data center and server markets where varying compute-to-memory ratios can present end users with more efficient node designs for particular workloads. In this paper we compare the recently released Marvell ThunderX2 Arm processor - arguably the first high-performance computing capable Arm design available in the marketplace. We perform a set of micro-benchmarking and mini-application evaluation on the ThunderX2 comparing it with Intel's Haswell and Skylake Xeon server parts commonly used in contemporary HPC designs. Our findings show that no one processor performs the best across all benchmarks, but that the ThunderX2 excels in areas demanding high memory bandwidth due to the provisioning of more memory channels in its design. We conclude that the ThunderX2 is a serious contender in the HPC server segment and has the potential to offer supercomputing sites with a viable high-performance alternative to existing designs from established industry players.

More Details

TYPE Conference Poster YEAR 2019

Scopus OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Hoekstra, Robert J.; Pedretti, Kevin P.; Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Lin, Paul L.; Vaughan, Courtney V.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Containers in HPC and Beyond

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Vanguard Astra Application Experience

Hammond, Simon D.; Laros, James H.; Pedretti, Kevin P.; Younge, Andrew J.; Vaughan, Courtenay T.; Lin, Paul L.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

ISC 19 Tutorial: Getting Started with Containers on HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads

Hammond, Simon D.; Hughes, Clayton H.; Levenhagen, Michael J.; Vaughan, Courtenay T.; Younge, Andrew J.; Schwaller, Benjamin S.; Aguilar, Michael J.; Pedretti, Kevin P.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads

Hammond, Simon D.; Hughes, Clayton H.; Levenhagen, Michael J.; Vaughan, Courtenay T.; Younge, Andrew J.; Schwaller, Benjamin S.; Aguilar, Michael J.; Pedretti, Kevin P.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Containers in HPC and Beyond

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

From Containerizing Testbeds for HPC Applications to Exascale Supercontainers

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Small scale to extreme: Methods for characterizing energy efficiency in supercomputing applications

Sustainable Computing: Informatics and Systems

Younge, Andrew J.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael; Olivier, Stephen L.; Pedretti, Kevin; Ward, Lee

Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.

More Details

TYPE Journal Article YEAR 2019

Scopus OSTI DOI

Evaluation of Hardware-Based MPI Acceleration on Astra

Aguilar, Michael J.; Pedretti, Kevin P.; Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Vanguard Astra: A Prototype Petascale Arm Supercomputer

Hughes, Clayton H.; Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Younge, Andrew J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Vanguard Astra: A Prototype Petascale Arm Supercomputer

Hughes, Clayton H.; Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Younge, Andrew J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Containers in HPC and Beyond

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Deployment and Usage of Containers for Production HPC Applications

Agelastos, Anthony M.; Warren, Aron W.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SNL ATDM Software Ecosystem

Olivier, Stephen L.; Brightwell, Ronald B.; Pedretti, Kevin P.; Younge, Andrew J.; Evans, Noah; Levy, Scott L.; Ferreira, Kurt B.; Grant, Ryan E.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Data Pallets: Containerizing Storage for Reproducibility and Traceability

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Lofstead, Jay; Baker, Joshua B.; Younge, Andrew J.

Trusting simulation output is crucial for Sandia’s mission objectives. We rely on these simulations to perform our high-consequence mission tasks given national treaty obligations. Other science and modeling applications, while they may have high-consequence results, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed workflow and provenance systems to aid in both automating simulation and modeling execution as well as determining exactly how was some output was created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated “sandbox” and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project explores extending the container concept to include storage as a new container type we call data pallets. Data Pallets are potentially writeable, auto generated by the system based on IO activities, and usable as a way to link the contained data back to the application and input deck used to create it.

More Details

TYPE Journal Article YEAR 2019

Scopus OSTI DOI

The Astra Supercomputer

Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Pedretti, Kevin P.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SC18 BOF: Containers in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard Astra: A Prototype Arm Supercomputer

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Grappling with HPC Architecture Diversity in Containers

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Data Pallets For Traceable Data

Lofstead, Gerald F.; Baker, Joshua B.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

End-to-end Provenance Traceability and Reproducibility Through "Palletized'' Simulation Data

Lofstead, Gerald F.; Younge, Andrew J.; Baker, Joshua B.

Trusting simulation output is crucial for Sandia's mission objectives. We rely on these simulations to perform our high-consequence mission tasks given our treaty obligations. Other science and modelling needs, while they may not be high-consequence, still require the strongest levels of trust to enable using the result as the foundation for both practical applications and future research. To this end, the computing community has developed work- flow and provenance systems to aid in both automating simulation and modelling execution, but to also aid in determining exactly how was some output created so that conclusions can be drawn from the data. Current approaches for workflows and provenance systems are all at the user level and have little to no system level support making them fragile, difficult to use, and incomplete solutions. The introduction of container technology is a first step towards encapsulating and tracking artifacts used in creating data and resulting insights, but their current implementation is focused solely on making it easy to deploy an application in an isolated "sandbox" and maintaining a strictly read-only mode to avoid any potential changes to the application. All storage activities are still using the system-level shared storage. This project was an initial exploration into extending the container concept to also include storage and to use writable containers, auto generated by the system, as a way to link the contained data back to the simulation and input deck used to create it.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

Quantifying Metrics to Evaluate Containers for Deployment and Usage of NNSA Production Applications

Younge, Andrew J.; Agelastos, Anthony M.; Lofstead, Gerald F.; Warren, Aron W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Younge, Andrew J.; Laros, James H.; Hammond, Simon D.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

FY18 L2 Milestone #8759 Report: Vanguard Astra and ATSE ? an ARM-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

A comparison of power management mechanisms: P-States vs. node-level power cap control

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Pedretti, Kevin P.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Ward, Harry L.; Younge, Andrew J.

Large-scale HPC systems increasingly incorporate sophisticated power management control mechanisms. While these mechanisms are potentially useful for performing energy and/or power-aware job scheduling and resource management (EPA JSRM), greater understanding of their operation and performance impact on real-world applications is required before they can be applied effectively in practice. In this paper, we compare static p-state control to static node-level power cap control on a Cray XC system. Empirical experiments are performed to evaluate node-to-node performance and power usage variability for the two mechanisms. We find that static p-state control produces more predictable and higher performance characteristics than static node-level power cap control at a given power level. However, this performance benefit is at the cost of less predictable power usage. Static node-level power cap control produces predictable power usage but with more variable performance characteristics. Our results are not intended to show that one mechanism is better than the other. Rather, our results demonstrate that the mechanisms are complementary to one another and highlight their potential for combined use in achieving effective EPA JSRM solutions.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI DOI

HPC at Sandia: Exploring the Virtualization and Containerization of ARM64 Processors for Future HPC Workloads

Younge, Andrew J.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

HPC Containerization with Singularity

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Leveraging Containerization for DevOps with Sandia's HPC Workloads

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Younge, Andrew J.; Pedretti, Kevin P.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Portals 4: Status of Specification and Implementation

Younge, Andrew J.; Grant, Ryan E.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Containers in HPC: Scaling form Laptops to Supercomputers

Younge, Andrew J.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Advanced Power Measurement and Control for the Trinity Supercomputer

Younge, Andrew J.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Supporting High Performance Analytics with System Software for Virtualized Supercomputing

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

A Comparison of Power Management Mechanisms: P-states vs. Node-Level Power Cap Control

Pedretti, Kevin P.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Ward, Harry L.; Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

Evaluating energy and power profiling techniques for HPC workloads

2017 8th International Green and Sustainable Computing Conference, IGSC 2017

Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.; Ward, Harry L.; Younge, Andrew J.

Advanced power measurement capabilities are becoming available on large scale High Performance Computing (HPC) deployments. There exist several approaches to providing power measurements today, primarily through in-band (e.g. RAPL) and out-of-band measurements (e.g. power meters). Both types of measurement can be augmented with application-level profiling, however it can be difficult to assess the type and detail of measurement needed to obtain insight from the application power profile. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three HPC mini-applications are analyzed across three production HPC systems to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. This work also provides a set of recommendations for how to best profile HPC workloads.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI

A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

Younge, Andrew J.; Pedretti, Kevin P.; Grant, Ryan E.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

Younge, Andrew J.; Pedretti, Kevin P.; Grant, Ryan E.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI DOI

Evaluating Energy and Power Profiling Techniques for HPC Workloads

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Practice and Experience using Containers in HPC

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Younge, Andrew J.; Pedretti, Kevin P.; Grant, Ryan E.; Gaines, Brian G.; Brightwell, Ronald B.

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed computing models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging software ecosystems.In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifically, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, effectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI

Project Vanguard - Prototyping a large-scale ARM-based HPC platform at Sandia

Younge, Andrew J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI