Publications

Results 51–100 of 218
Skip to search filters

FY18 L2 Milestone #8759 Report: Vanguard Astra and ATSE ? an ARM-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.A.; Andersson, S.A.; Brandt, James M.; Cardo, N.C.; Chunduri, S.C.; Enos, J.E.; Fields, P.F.; Gentile, Ann C.; Gerber, R.B.; Gienger, M.G.; Greenseid, J.G.; Greiner, A.G.; Hadri, B.H.; He, Y.H.; Hoppe, D.H.; Kaila, U.K.; Kelly, K.K.; Klein, M.K.; Kristiansen, A.K.; Leak, S.L.; Mason, M.M.; Pedretti, Kevin P.; Piccinali, J-G.P.; Repik, Jason; Rogers, J.R.; Salminen, S.S.; showerman, m.s.; Whitney, C.W.; Williams, J.W.

Abstract not provided.

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

Open science on Trinity's knights landing partition: An analysis of user job data

ACM International Conference Proceeding Series

Levy, Scott; Pedretti, Kevin P.; Ferreira, Kurt B.

High-performance computing (HPC) systems are critically important to the objectives of universities, national laboratories, and commercial companies. Because of the cost of deploying and maintaining these systems ensuring their efficient use is imperative. Job scheduling and resource management are critically important to the efficient use of HPC systems. As a result, significant research has been conducted on how to effectively schedule user jobs on HPC systems. Developing and evaluating job scheduling algorithms, however, requires a detailed understanding of how users request resources on HPC systems. In this paper, we examine a corpus of job data that was collected on Trinity, a leadership-class supercomputer. During the stabilization period of its Intel Xeon Phi (Knights Landing) partition, it was made available to users outside of a classified environment for the Trinity Open Science Phase 2 campaign. We collected information from the resource manager about each user job that was run during this Open Science period. In this paper, we examine the jobs contained in this dataset. Our analysis reveals several important characteristics of the jobs submitted during the Open Science period and provides critical insight into the use of one of the most powerful supercomputers in existence. Specifically, these data provide important guidance for the design, development, and evaluation of job scheduling and resource management algorithms.

More Details

A comparison of power management mechanisms: P-States vs. node-level power cap control

Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2018

Pedretti, Kevin P.; Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Ward, Harry L.; Younge, Andrew J.

Large-scale HPC systems increasingly incorporate sophisticated power management control mechanisms. While these mechanisms are potentially useful for performing energy and/or power-aware job scheduling and resource management (EPA JSRM), greater understanding of their operation and performance impact on real-world applications is required before they can be applied effectively in practice. In this paper, we compare static p-state control to static node-level power cap control on a Cray XC system. Empirical experiments are performed to evaluate node-to-node performance and power usage variability for the two mechanisms. We find that static p-state control produces more predictable and higher performance characteristics than static node-level power cap control at a given power level. However, this performance benefit is at the cost of less predictable power usage. Static node-level power cap control produces predictable power usage but with more variable performance characteristics. Our results are not intended to show that one mechanism is better than the other. Rather, our results demonstrate that the mechanisms are complementary to one another and highlight their potential for combined use in achieving effective EPA JSRM solutions.

More Details

Evaluating energy and power profiling techniques for HPC workloads

2017 8th International Green and Sustainable Computing Conference, IGSC 2017

Grant, Ryan E.; Laros, James H.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.; Ward, Harry L.; Younge, Andrew J.

Advanced power measurement capabilities are becoming available on large scale High Performance Computing (HPC) deployments. There exist several approaches to providing power measurements today, primarily through in-band (e.g. RAPL) and out-of-band measurements (e.g. power meters). Both types of measurement can be augmented with application-level profiling, however it can be difficult to assess the type and detail of measurement needed to obtain insight from the application power profile. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three HPC mini-applications are analyzed across three production HPC systems to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. This work also provides a set of recommendations for how to best profile HPC workloads.

More Details

Characterizing MPI matching via trace-based simulation

ACM International Conference Proceeding Series

Ferreira, Kurt B.; Levy, Scott; Pedretti, Kevin P.; Grant, Ryan E.

With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application’s execution, including its matching performance, and is highly dependent on the MPI library’s matchlist implementation. In this paper, we introduce a trace-based simulation approach to obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.

More Details

Enabling Diverse Software Stacks on Supercomputers Using High Performance Virtual Clusters

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Younge, Andrew J.; Pedretti, Kevin P.; Grant, Ryan E.; Gaines, Brian G.; Brightwell, Ronald B.

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed computing models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging software ecosystems.In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifically, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, effectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

More Details

FY17 CSSE L2 Milestone Report: Analyzing Power Usage Characteristics of Workloads Running on Trinity

Pedretti, Kevin P.

This report summarizes the work performed as part of a FY17 CSSE L2 milestone to in- vestigate the power usage behavior of ASC workloads running on the ATS-1 Trinity plat- form. Techniques were developed to instrument application code regions of interest using the Power API together with the Kokkos profiling interface and Caliper annotation library. Experiments were performed to understand the power usage behavior of mini-applications and the SNL/ATDM SPARC application running on ATS-1 Trinity Haswell and Knights Landing compute nodes. A taxonomy of power measurement approaches was identified and presented, providing a guide for application developers to follow. Controlled scaling study experiments were performed on up to 2048 nodes of Trinity along with smaller scale ex- periments on Trinity testbed systems. Additionally, power and energy system monitoring information from Trinity was collected and archived for post analysis of "in-the-wild" work- loads. Results were analyzed to assess the sensitivity of the workloads to ATS-1 compute node type (Haswell vs. Knights Landing), CPU frequency control, node-level power capping control, OpenMP configuration, Knights Landing on-package memory configuration, and algorithm/solver configuration. Overall, this milestone lays groundwork for addressing the long-term goal of determining how to best use and operate future ASC platforms to achieve the greatest benefit subject to a constrained power budget.

More Details

The Portals 4.1 Network Programming Interface

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hemmert, Karl S.; Pedretti, Kevin P.; Wheeler, Kyle W.; Underwood, Keith; Riesen, Rolf R.; Maccabe, Arthur B.; Hudson, Trammel H.

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tar geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.

More Details

High Performance Computing - Power Application Programming Interface Specification Version 2.0

Laros, James H.; Grant, Ryan E.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.; Ward, Harry L.; Younge, Andrew J.

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

Standardizing Power Monitoring and Control at Exascale

Computer

Grant, Ryan E.; Levenhagen, Michael J.; Olivier, Stephen L.; DeBonis, David D.; Pedretti, Kevin P.; Laros, James H.

Power API - the result of collaboration among national laboratories, universities, and major vendors - provides a range of standardized power management functions, from application-level control and measurement to facility-level accounting, including real-time and historical statistics gathering. Support is already available for Intel and AMD CPUs and standalone measurement devices.

More Details

High Performance Computing - Power Application Programming Interface Specification Version 1.4

Laros, James H.; DeBonis, David D.; Grant, Ryan E.; Kelly, Suzanne M.; Levenhagen, Michael J.; Olivier, Stephen L.; Pedretti, Kevin P.

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details

A cross-enclave composition mechanism for exascale system software

Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Evans, Noah; Pedretti, Kevin P.; Kocoloski, Brian; Lange, John; Lang, Michael; Bridges, Patrick G.

As supercomputers move to exascale, the number of cores per node continues to increase, but the I/O bandwidth between nodes is increasing more slowly. This leads to computational power outstripping I/O bandwidth. This growth, in turn, encourages moving as much of an HPC workflow as possible onto the node in order to minimize data movement. One particular method of application composition, enclaves, co-locates different operating systems and runtimes on the same node where they communicate by in situ communication mechanisms. In this work, we describe a mechanism for communicating between composed applications. We implement a mechanism using Copy onWrite cooperating with XEMEM shared memory to provide consistent, implicitly unsynchronized communication across enclaves. We then evaluate this mechanism using a composed application and analytics between the Kitten Lightweight Kernel and Linux on top of the Hobbes Operating System and Runtime. These results show a 3% overhead compared to an application running in isolation, demonstrating the viability of this approach.

More Details

High Performance Computing - Power Application Programming Interface Specification

Laros, James H.; Kelly, Suzanne M.; Pedretti, Kevin P.; Grant, Ryan E.; Olivier, Stephen L.; Levenhagen, Michael J.; DeBonis, David D.; Laros, James H.

Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

More Details
Results 51–100 of 218
Results 51–100 of 218