Publications Search

Modern high performance computers connect hundreds of thousands of endpoints and employ thousands of switches. This allows for a great deal of freedom in the design of the network topology. At the same time, due to the sheer numbers and complexity involved, it becomes more challenging to easily distinguish between promising and improper designs. With ever increasing line rates and advances in optical interconnects, there is a need for renewed design methodologies that comprehensively capture the requirements and expose tradeoffs expeditiously in this complex design space. We introduce a systematic approach, based on Generalized Moore Graphs, allowing one to quickly gauge the ideal level of connectivity required for a given number of end-points and traffic hypothesis, and to collect insight on the role of the switch radix in the topology cost. Based on this approach, we present a methodology for the identification of Pareto-optimal topologies. We apply our method to a practical case with 25,000 nodes and present the results.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI

Preparations for Trinity KNL

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

An evaluation of MPI message rate on hybrid-core processors

International Journal of High Performance Computing Applications

Barrett, Brian W.; Brightwell, Ronald B.; Grant, Ryan E.; Hammond, Simon D.; Hemmert, Karl S.

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insight into how MPI implementations for future hybrid-core processors should be designed.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Sandia's Advanced Architecture Test Beds

Laros, James H.; Ang, James A.; Hammond, Simon D.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Trinity Benchmarks on Xeon Phi (Knights Corner)

Rajan, Mahesh R.; Doerfler, Douglas W.; Hammond, Simon D.; Trott, Christian R.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

The Structural Simulation Toolkit

Rodrigues, Arun; Moore, Branden J.; Hammond, Simon D.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI DOI

SNAP: Strong Scaling High Fidelity Molecular

Trott, Christian R.; Hammond, Simon D.; Thompson, Aidan P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Abstract Machine Models

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Characterizing Mini-App Workloads

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

SST Cassini Component

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Two Years of Co-Design

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Abstract machine models and proxy architectures for exascale computing

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Ang, James A.; Barrett, R.F.; Benner, R.E.; Burke, D.; Chan, C.; Cook, J.; Donofrio, D.; Hammond, Simon D.; Hemmert, Karl S.; Kelly, Suzanne M.; Le, H.; Leung, Vitus J.; Resnick, D.R.; Rodrigues, Arun; Shalf, J.; Stark, Dylan S.; Unat, D.; Wright, N.J.

To achieve exascale computing, fundamental hardware architectures must change. This will significantly impact scientific applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. To adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency in the future. An abstract machine model is designed to expose to the application developers and system software only the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. A proxy architecture is a parameterized version of an abstract machine model, with parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models enable discussion among the developers of analytic models and simulators and computer hardware architects and they allow for application performance analysis, system software development, and hardware optimization opportunities. In this paper, we present a set of abstract machine models and show how they might be used to help software developers prepare for exascale. We then apply parameters to one of these models to demonstrate how a proxy architecture can enable a more concrete exploration of how well application codes map onto future architectures.

More Details

TYPE Conference Poster YEAR 2014

Scopus OSTI DOI

SNAP: Strong scaling high fidelity molecular dynamics simulations on leadership-class computing platforms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Trott, Christian R.; Hammond, Simon D.; Thompson, Aidan P.

The rapidly improving compute capability of contemporary processors and accelerators is providing the opportunity for significant increases in the accuracy and fidelity of scientific calculations. In this paper we present performance studies of a new molecular dynamics (MD) potential called SNAP. The SNAP potential has shown great promise in accurately reproducing physics and chemistry not described by simpler potentials. We have developed new algorithms to exploit high single-node concurrency provided by three different classes of machine: the Titan GPU-based system operated by Oak Ridge National Laboratory, the combined Sequoia and Vulcan BlueGene/Q machines located at Lawrence Livermore National Laboratory, and the large-scale Intel Sandy Bridge system, Chama, located at Sandia. Our analysis focuses on strong scaling experiments with approximately 246,000 atoms over the range 1-122,880 nodes on Sequoia/Vulcan and 40-18,630 nodes on Titan. We compare these machine in terms of both simulation rate and power efficiency. We find that node performance correlates with power consumption across the range of machines, except for the case of extreme strong scaling, where more powerful compute nodes show greater efficiency. This study is a unique assessment of a challenging, scientifically relevant calculation running on several of the world's leading contemporary production supercomputing platforms. © 2014 Springer International Publishing.

More Details

TYPE Conference YEAR 2014

Scopus OSTI DOI

Application Memory Analysis

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon D.

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details

TYPE Journal Article YEAR 2013

OSTI DOI

The Path to Exascale Experiences porEng and debugging for Intel Xeon Phi

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Performance on Advanced Systems Test Beds

Trott, Christian R.; Hammond, Simon D.; Kelly, Suzanne M.; Laros, James H.; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

I don't wanna grow up... stuck at predictive capability maturity model level zero!

Rider, William J.; Kelly, Suzanne M.; Barrett, Richard F.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

NNSA/ASC Test Bed Update

Hammond, Simon D.; Barrett, Richard F.; Vaughan, Courtenay T.; Trott, Christian R.; Laros, James H.; Kelly, Suzanne M.; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

SST and Test-Bed Hack-a-thon

Hammond, Simon D.; Rodrigues, Arun; Kelly, Suzanne M.; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

A Glimpse into the the Next Decade of Supercomputing: An Overview of Sandia's Advanced Test Bed Project

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

I Don't Wanna Grow Up...Stuck at Predictive Capability Maturity Model Level Zero

Kelly, Suzanne M.; Rider, William J.; Barrett, Richard F.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications

Barrett, Richard F.; Hammond, Simon D.; Hsieh, Mingyu H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Impact of Hybrid-Core Processors on MPI Message Rate

Barrett, Brian B.; Brightwell, Ronald B.; Hemmert, Karl S.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Impact of Hybrid-Core Processors on MPI Message Rate

Barrett, Brian B.; Brightwell, Ronald B.; Hammond, Simon D.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

SST (micro) Introduction - Presentation to SST Hack-a-thon Attendees

Rodrigues, Arun; Hammond, Simon D.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

SST Hack-a-thon Component Overview

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Mantevo Suite 1.0

Barrett, Richard F.; Willenbring, James M.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

SST and ExMatEx Update

Hammond, Simon D.; Rodrigues, Arun; Kelly, Suzanne M.; Vandyke, John P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Experiences with Xeon Phi

Hammond, Simon D.; Rajamanickam, Sivasankaran R.; Ang, James A.; Barrett, Richard F.; Doerfler, Douglas W.; Heroux, Michael A.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon D.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using Miniapplications in a Mantevo Framework for Optimizing Sandia's SPARC CFD Code on Multi-Core Many-Core and GPU-Accelerated Platforms

Barrett, Richard F.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Finding an On--Ramp to the Exascale Highway

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Emerging HPC Systems and Next Generation Engineering Analysis Applications

Ang, James A.; Barrett, Richard F.; Hammond, Simon D.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Using Miniapplications in a Mantevo Framework for Optimizing Sandia's SPARC CFD Code on Multi-Core Many-Core and GPU-Accelerated Compute Platforms

Hammond, Simon D.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Navigating an Evolutionary Fast Path to Exascale

Barrett, Richard F.; Hammond, Simon D.; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Assessing the predictive capabilities of mini-applications

Barrett, Richard F.; Crozier, Paul C.; Doerfler, Douglas W.; Hammond, Simon D.; Heroux, Michael A.; Lin, Paul L.; Trucano, Timothy G.; Vaughan, Courtenay T.; Williams, Alan B.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Sandia's Appro %22Compton%22Cluster

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Unprecedented Scalability and Performance of the new NNSA Tri-Lab Capacity Cluster 2 (TLCC2)

Rajan, Mahesh R.; Doerfler, Douglas W.; Lin, Paul L.; Hammond, Simon D.; Barrett, Richard F.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Co-Design

Hammond, Simon D.; Ang, James A.; Barrett, Richard F.; Doerfler, Douglas W.; Heroux, Michael A.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Unprecedented Scalability and Performance of the new NNSA Tri-Lab Capacity Cluster 2 (TLCC2)

Doerfler, Douglas W.; Lin, Paul L.; Hammond, Simon D.; Barrett, Richard F.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Intel MIC Architecture

Ang, James A.; Kelly, Suzanne M.; Hammond, Simon D.; Barrett, Richard F.; Levenhagen, Michael J.; Rodrigues, Arun; Pedretti, Kevin P.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Towards Automated Memory Model Generation Via Event Tracing

Computer Journal

Hammond, Simon D.

The importance of memory performance and capacity is a growing concern for high performance computing laboratories around the world. It has long been recognized that improvements in processor speed exceed the rate of improvement in dynamic random access memory speed and, as a result, memory access times can be the limiting factor in high performance scientific codes. The use of multi-core processors exacerbates this problem with the rapid growth in the number of cores not being matched by similar improvements in memory capacity, increasing the likelihood of memory contention. In this paper, we present WMTools , a lightweight memory tracing tool and analysis framework for parallel codes, which is able to identify peak memory usage and also analyse per-function memory use over time. An evaluation of WMTools , in terms of its effectiveness and also its overheads, is performed using nine established scientific applications/benchmark codes representing a variety of programming languages and scientific domains. We also show how WMTools can be used to automatically generate a parameterized memory model for one of these applications, a two-dimensional non-linear magnetohydrodynamics application, Lare2D . Through the memory model we are able to identify an unexpected growth term which becomes dominant at scale. With a refined model we are able to predict memory consumption with under 7% error.

More Details

TYPE Journal Article YEAR 2012

OSTI DOI