A Tale of Two Systems: Using Containers to Deploy HPC Applications on Supercomputers and Clouds

The metrics used for evaluating energy saving techniques for future HPC systems are critical to the correct assessment of proposed methods. Current predictions forecast that overcoming reduced system reliability, increased power requirements and energy consumption will be a major design challenge for future systems. Modern runtime energy-saving research efforts do not take into account the energy spent providing reliability. They also do not account for the increase in the probability of failure during application execution due to runtime overhead from energy saving methods. While this is very reasonable for current systems, it is insufficient for future generation systems. By taking into account the energy consumption ramifications of increased runtimes on system reliability, better energy saving techniques can be developed. This paper demonstrates how to determine the impact of runtime energy conservation methods within the context of failure-prone large scale systems. In addition, a survey of several energy savings methodologies is conducted and an analysis is performed with respect to their effectiveness in an environment in which failures occur.

More Details

TYPE Conference YEAR 2014

Scopus OSTI DOI

An evaluation of MPI message rate on hybrid-core processors

International Journal of High Performance Computing Applications

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hammond, Simon D.; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Comparing, contrasting, generalizing, and integrating two current designs for fault-tolerant MPI

ACM International Conference Proceeding Series

Proposed for publication in Future Generation Computer Systems.

Ferreira, Kurt; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2013

OSTI

Addressing the System Software Challenges for Converged Simulation and Analysis on Extreme-Scale Systems

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

A GPU-based Checkpoint Compression Study Size Does Matter -- More Than Speed Anyway

Ferreira, Kurt; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Portals 4.0 network programming interface

Brightwell, Ronald B.; Pedretti, Kevin P.; Wheeler, Kyle B.; Hemmert, Karl S.; Barrett, Brian B.

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generation of machines employing advanced network interface architectures that support enhanced offload capabilities.

More Details

TYPE SAND Report YEAR 2012

OSTI DOI

Programming and Run-Time Models for Heavily Threaded Systems: Run-Time Systems Panel

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Palacios and kitten: New high performance operating systems for scalable virtualized and native supercomputing

Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010

Lange, John; Pedretti, Kevin P.; Hudson, Trammell; Dinda, Peter; Cui, Zheng; Xia, Lei; Bridges, Patrick; Gocke, Andy; Jaconette, Steven; Levenhagen, Michael J.; Brightwell, Ronald B.

Palacios is a new open-source VMM under development at Northwestern University and the University of New Mexico that enables applications executing in a virtualized environment to achieve scalable high performance on large machines. Palacios functions as a modularized extension to Kitten, a high performance operating system being developed at Sandia National Laboratories to support large-scale supercomputing applications. Together, Palacios and Kitten provide a thin layer over the hardware to support full-featured virtualized environments alongside Kitten's lightweight native environment. Palacios supports existing, unmodified applications and operating systems by using the hardware virtualization technologies in recent AMD and Intel processors. Additionally, Palacios leverages Kitten's simple memory management scheme to enable low-overhead pass-through of native devices to a virtualized environment. We describe the design, implementation, and integration of Palacios and Kitten. Our benchmarks show that Palacios provides near native (within 5%), scalable performance for virtualized environments running important parallel applications. This new architecture provides an incremental path for applications to use supercomputers, running specialized lightweight host operating systems, that is not significantly performance-compromised. © 2010 IEEE.

More Details

TYPE Conference YEAR 2010

Scopus OSTI

Advanced Architectures for Exascale Computing

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Challenges for high-performance networking for exascale computing

Brightwell, Ronald B.; Barrett, Brian B.; Hemmert, Karl S.

Achieving the next three orders of magnitude performance increase to move from petascale to exascale computing will require a significant advancements in several fundamental areas. Recent studies have outlined many of the challenges in hardware and software that will be needed. In this paper, we examine these challenges with respect to high-performance networking. We describe the repercussions of anticipated changes to computing and networking hardware and discuss the impact that alternative parallel programming models will have on the network software stack. We also present some ideas on possible approaches that address some of these challenges.

More Details

TYPE Conference YEAR 2010

OSTI

Exploiting Direct Access Shared Memory for MPI on Multi-Core Processors

International Journal of High-Performance Computing Applications

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2010

OSTI DOI

rMPI : increasing fault resiliency in a message-passing environment

Ferreira, Kurt; Riesen, Rolf; Oldfield, Ron A.; Laros, James H.; Pedretti, Kevin P.; Stearley, Jon S.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Transparent redundant computing with MPI

Brightwell, Ronald B.; Ferreira, Kurt

Extreme-scale parallel systems will require alternative methods for applications to maintain current levels of uninterrupted execution. Redundant computation is one approach to consider, if the benefits of increased resiliency outweigh the cost of consuming additional resources. We describe a transparent redundancy approach for MPI applications and detail two different implementations that provide the ability to tolerate a range of failure scenarios, including loss of application processes and connectivity.We compare these two approaches and show performance results from micro-benchmarks that bound worst-case message passing performance degradation.We propose several enhancements that could lower the overhead of providing resiliency through redundancy.

More Details

TYPE Conference YEAR 2010

OSTI

On the path to exascale

International Journal of Distributed Systems and Technologies

Alvin, Kenneth F.; Barrett, Brian B.; Brightwell, Ronald B.; Dosanjh, Sudip S.; Geist, Al; Hemmert, Karl S.; Heroux, Michael; Kothe, Doug; Murphy, Richard C.; Nichols, Jeff; Oldfield, Ron A.; Rodrigues, Arun; Vetter, Jeffrey S.

There is considerable interest in achieving a 1000 fold increase in supercomputing power in the next decade, but the challenges are formidable. In this paper, the authors discuss some of the driving science and security applications that require Exascale computing (a million, trillion operations per second). Key architectural challenges include power, memory, interconnection networks and resilience. The paper summarizes ongoing research aimed at overcoming these hurdles. Topics of interest are architecture aware and scalable algorithms, system simulation, 3D integration, new approaches to system-directed resilience and new benchmarks. Although significant progress is being made, a broader international program is needed.

More Details

TYPE Journal Article YEAR 2010

Scopus OSTI

A lightweight, GPU-based software RAID system

Brightwell, Ronald B.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

System Software Research for Extreme-Scale Computing

Oldfield, Ron A.; Brightwell, Ronald B.; Pedretti, Kevin P.; Riesen, Rolf; Ferreira, Kurt; Kelly, Suzanne M.; Laros, James H.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Parallel phase model: A programming model for high-end parallel machines with manycores

Proceedings of the International Conference on Parallel Processing

Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2006

OSTI

A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark

Plimpton, Steven J.; Brightwell, Ronald B.; Vaughan, Courtenay T.; Underwood, Keith

Abstract not provided.

More Details

TYPE Conference YEAR 2006

OSTI

A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark

Underwood, Keith; Plimpton, Steven J.; Brightwell, Ronald B.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2006

OSTI

Brightwell, Ronald B.; Brightwell, Ronald B.; Maccabe, Arthur B.; Riesen, Rolf

Abstract not provided.

More Details

TYPE Conference YEAR 2003

OSTI

Programming Paradigms for Massively Parallel Computers: LDRD Project Final Report

Brightwell, Ronald B.

This technical report presents the initial proposal and renewable proposals for an LDRD project whose intended goal was to enable applications to take full advantage of the hardware available on Sandia's current and future massively parallel supercomputers by analyzing various ways of combining distributed-memory and shared-memory programming models. Despite Sandia's enormous success with distributed-memory parallel machines and the message-passing programming model, clusters of shared-memory processors appeared to be the massively parallel architecture of the future at the time this project was proposed. They had hoped to analyze various hybrid programming models for their effectiveness and characterize the types of application to which each model was well-suited. The report presents the initial research proposal and subsequent continuation proposals that highlight the proposed work and summarize the accomplishments.

More Details

TYPE Report YEAR 2001

OSTI DOI

Scalability limitations of VIA-based technologies in supporting MPI

Brightwell, Ronald B.; Maccabe, Arthur B.

This paper analyzes the scalability limitations of networking technologies based on the Virtual Interface Architecture (VIA) in supporting the runtime environment needed for an implementation of the Message Passing Interface. The authors present an overview of the important characteristics of VIA and an overview of the runtime system being developed as part of the Computational Plant (Cplant) project at Sandia National Laboratories. They discuss the characteristics of VIA that prevent implementations based on this system to meet the scalability and performance requirements of Cplant.

More Details

TYPE Conference YEAR 2000

OSTI

Scalability and Performance of a Large Linux Cluster

Journal of Parallel and Distributed Computing

Brightwell, Ronald B.; Plimpton, Steven J.

In this paper the authors present performance results from several parallel benchmarks and applications on a 400-node Linux cluster at Sandia National Laboratories. They compare the results on the Linux cluster to performance obtained on a traditional distributed-memory massively parallel processing machine, the Intel TeraFLOPS. They discuss the characteristics of these machines that influence the performance results and identify the key components of the system software that they feel are important to allow for scalability of commodity-based PC clusters to hundreds and possibly thousands of processors.

More Details

TYPE Journal Article YEAR 2000

OSTI