Publications Search

Canonical Polyadic tensor decomposition using alternate Poisson regression (CP-APR) is an effective analysis tool for large sparse count datasets. One of the variants using projected damped Newton optimization for row subproblems (PDNR) offers quadratic convergence and is amenable to parallelization. Despite its potential effectiveness, PDNR performance on modern high performance computing (HPC) systems is not well understood. To remedy this, we have developed a parallel implementation of PDNR using Kokkos, a performance portable parallel programming framework supporting efficient runtime of a single code base on multiple HPC systems. We demonstrate that the performance of parallel PDNR can be poor if load imbalance associated with the irregular distribution of nonzero entries in the tensor data is not addressed. Preliminary results using tensors from the FROSTT data set indicate that using multiple kernels to address this imbalance when solving the PDNR row subproblems in parallel can improve performance, with up to 80% speedup on CPUs and 10-fold speedup on NVIDIA GPUs.

More Details

TYPE Conference Poster YEAR 2020

OSTI Scopus

Chapel performance for some graph analytics

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

SparTen: Leveraging Kokkos for On-node Parallelism in a Second-Order Method for Fitting Canonical Polyadic Tensor Models to Poisson Data

Teranishi, Keita; Dunlavy, Daniel M.; Myers, Jeremy M.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2020

OSTI

Exploring chapel productivity using some graph algorithms

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020

Barrett, Richard F.; Cook, Jeanine; Olivier, Stephen L.; Aaziz, Omar R.; Jenkins, Chris; Vaughan, Courtenay T.

A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust.

More Details

TYPE Conference Poster YEAR 2020

DOI OSTI Scopus

Some Linear Algebra-based Graph Analytics using Chapel

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2020

OSTI

Abstract Machine Models and Proxy Architectures for Exascale Computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, Daniel; Chan, Cy; Cook, Jeanine; Daley, Christopher S.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Hoekstra, Robert J.; Ibrahim, Khaled; Kelly, Suzanne M.; Le, Hoang; Leung, Vitus J.; Michelogiannakis, George; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan; Unat, D.; Wright, Nick J.; Voskuilen, Gwendolyn R.

To achieve exascale computing, fundamental hardware architectures must change. The most significant consequence of this assertion is the impact on the scientific and engineering applications that run on current high performance computing (HPC) systems, many of which codify years of scientific domain knowledge and refinements for contemporary computer systems. In order to adapt to exascale architectures, developers must be able to reason about new hardware and determine what programming models and algorithms will provide the best blend of performance and energy efficiency into the future. While many details of the exascale architectures are undefined, an abstract machine model is designed to allow application developers to focus on the aspects of the machine that are important or relevant to performance and code structure. These models are intended as communication aids between application developers and hardware architects during the co-design process. We use the term proxy architecture to describe a parameterized version of an abstract machine model, with the parameters added to elucidate potential speeds and capacities of key hardware components. These more detailed architectural models are formulated to enable discussion between the developers of analytic models and simulators and computer hardware architects. They allow for application performance analysis and hardware optimization opportunities. In this report our goal is to provide the application development community with a set of models that can help software developers prepare for exascale. In addition, through the use of proxy architectures, we can enable a more concrete exploration of how well new and evolving application codes map onto future architectures. This second version of the document addresses system scale considerations and provides a system-level abstract machine model with proxy architecture information.

More Details

TYPE SAND Report YEAR 2019

DOI OSTI

Performance portable parallel sparse CP-APR tensor decompositions

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Performance and Parallelization of CP-Alternate Poisson Regression Sparse Tensor Decomposition

Teranishi, Keita; Hollman, David S.; Barrett, Richard F.; Myers, Jeremy M.; Dunlavy, Daniel M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Development of parallel sparse CP-APR tensor decomposition solvers

Teranishi, Keita; Hollman, David S.; Dunlavy, Daniel M.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Some experiences with software engineering

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Performance portable parallel sparse CP-APR tensor decompositions

Teranishi, Keita; Dunlavy, Daniel M.; Barrett, Richard F.; Kolda, Tamara G.; Forster, Christopher

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Performance portable parallel CP-APR tensor decompositions

Teranishi, Keita; Barrett, Richard F.; Forster, Christopher; Dunlavy, Daniel M.; Kolda, Tamara G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Scheduling Chapel tasks with Qthreads on manycore: A tale of two schedulers

Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2017 - In conjunction with HPDC

Evans, Noah; Olivier, Stephen L.; Barrett, Richard F.; Stelle, George

This paper describes improvements in task scheduling for the Chapel parallel programming language provided in its default on-node tasking runtime, the Qthreads library. We describe a new scheduler distrib which builds on the approaches of two previous Qthreads schedulers, Sherwood and Nemesis, and combines the best aspects of both-work stealing and load balancing from Sherwood and a lock free queue access from Nemesis- to make task queuing better suited for the use of Chapel in the manycore era. We demonstrate the efficacy of this new scheduler by showing improvements in various individual benchmarks of the Chapel test suite on the Intel Knights Landing architecture.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

Social analytics

Barrett, Richard F.; Caswell, Jacob

More Details

TYPE Presentation YEAR 2016

OSTI

Exploring the performance potential of the Chapel programming language

Barrett, Richard F.; Caswell, Jacob

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Final Review Memo from ATDM L2 Milestone Review Panel to ATDM L2 Milestone Team and Associated Management

Hough, Patricia D.; Barone, Matthew F.; Barrett, Richard F.; Mish, Kyran D.; Thornquist, Heidi K.

On Thursday, August 25, 2016, the ATDM L2 milestone review panel met with the milestone team to conduct a final assessment of the completeness and quality of the work performed. First and foremost, the panel would like to congratulate and commend the milestone team for a job well done. The team completed a significant body of high-quality work toward very ambitious goals. Additionally, their persistence in working through the technical challenges associated with evolving technology, the nontechnical challenges associated with integrating across multiple software development teams, and the many demands on their time speaks volumes about their commitment to delivering the best work possible to advance the ATDM program. The panel’s comments on the individual completion criteria appear in the last section of this memo.

More Details

TYPE Other Report YEAR 2016

DOI OSTI

Exploring the performance potential of the Chapel programming language

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Some Scientific Supercomputing

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Enabling tractable exploration of the performance of adaptive mesh refinement

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Vaughan, Courtenay T.; Barrett, Richard F.

A broad range of physical phenomena in science and engineering can be explored using finite difference and volume based application codes. Incorporating Adaptive Mesh Refinement (AMR) into these codes focuses attention on the most critical parts of a simulation, enabling increased numerical accuracy of the solution while limiting memory consumption. However, adaptivity comes at the cost of increased runtime complexity, which is particularly challenging on emerging and expected future architectures. In order to explore the design space offered by new computing environments, we have developed a proxy application called miniAMR. MiniAMR exposes a range of the important issues that will significantly impact the performance potential of full application codes. In this paper, we describe miniAMR, demonstrate what is designed to represent in a full application code, and illustrate how it can be used to exploit future high performance computing architectures. To ensure an accurate understanding of what miniAMR is intended to represent, we compare it with CTH, a shock hydrodynamics code in heavy use throughout several computational science and engineering communities.

More Details

TYPE Conference Poster YEAR 2015

OSTI Scopus

Assessing a mini-application as a performance proxy for a finite element method engineering application

Concurrency and Computation. Practice and Experience

Lin, Paul T.; Heroux, Michael A.; Williams, Alan B.; Barrett, Richard F.

The performance of a large-scale, production-quality science and engineering application (‘app’) is often dominated by a small subset of the code. Even within that subset, computational and data access patterns are often repeated, so that an even smaller portion can represent the performance-impacting features. If application developers, parallel computing experts, and computer architects can together identify this representative subset and then develop a small mini-application (‘miniapp’) that can capture these primary performance characteristics, then this miniapp can be used to both improve the performance of the app as well as provide a tool for co-design for the high-performance computing community. However, a critical question is whether a miniapp can effectively capture key performance behavior of an app. This study provides a comparison of an implicit finite element semiconductor device modeling app on unstructured meshes with an implicit finite element miniapp on unstructured meshes. The goal is to assess whether the miniapp is predictive of the performance of the app. Finally, single compute node performance will be compared, as well as scaling up to 16,000 cores. Results indicate that the miniapp can be reasonably predictive of the performance characteristics of the app for a single iteration of the solver on a single compute node.

More Details

TYPE Journal Article YEAR 2015

OSTI DOI

Exploring Communication Options with Adaptive Mesh Refinement

Vaughan, Courtenay T.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

PerDome: A Performance Model for Heterogeneous Computing Systems

Barrett, Richard F.; Tang, Li; Hu, X.S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Performance Analysis of a Program Understanding Static Analysis Signature Search Algorithm

Deshpande, Aditya M.; Draper, Jeffrey T.; Rigdon, James B.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Co-design with proxy applications at the DOE NNSA Trilabs: performance of SNL proxies on modern and some future architectures

Lin, Paul T.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Toward an evolutionary task parallel integrated MPI + X Programming Model

Proceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015

Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.

The Bulk Synchronous Parallel programming model is showing performance limitations at high processor counts. We propose over-decomposition of the domain, operated on as tasks, to smooth out utilization of the computing resource, in particular the node interconnect and processing cores, and hide intra- and inter-node data movement. Our approach maintains the existing coding style commonly employed in computational science and engineering applications. Although we show improved performance on existing computers, up to 131,072 processor cores, the effectiveness of this approach on expected future architectures will require the continued evolution of capabilities throughout the codesign stack. Success then will not only result in decreased time to solution, but would also make better use of the hardware capabilities and reduce power and energy requirements, while fundamentally maintaining the current code configuration strategy.

More Details

TYPE Conference Poster YEAR 2015

DOI OSTI Scopus

Toward an Evolutionary Task Parallel Integrated MPI + X Programming Model

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Toward an Evolutionary Task Parallel Integrated MPI + X Programming Model

Barrett, Richard F.; Stark, Dylan T.; Vaughan, Courtenay T.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Performance and Energy Implications for Heterogeneous Computing Systems: A MiniFE Case Study

Barrett, Richard F.; Tang, Li; Hu, X.S.

Heterogeneous computing systems, which employ a mix of general-purpose (GP) processors and accelerators such as graphics processing units (GPUs) or Field Programmable Gate Arrays (FPGAs), have the potential to offer much higher performance and lower energy usage than homogeneous systems. However, designing heterogeneous computing systems to achieve high performance and low energy usage is a challenging task. Designs that offer higher performance do not necessarily lead to lower energy consumption. Furthermore, mapping of applications to different computing devices can play a key role in performance and energy tradeoff. In this report, we present a detailed performance and energy study of executing a specific mini-application on different heterogeneous systems. The results show that hardware choices, application implementations, and mapping of applications to hardware can all significantly impact system performance and energy consumption and that the impact on performance and energy can be quite different. This study forms a basis for modeling the interdependencies of program structures and hardware execution units, which could be used to guide design space exploration.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

Performance Analysis of a Program Understanding Static Analysis Signature Search Algorithm

Barrett, Richard F.; Rigdon, James B.; Deshpande, Aditya M.; Draper, Jeffrey T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Trinity Benchmarks on Xeon Phi (Knights Corner)

Rajan, Mahesh; Doerfler, Douglas W.; Hammond, Simon; Trott, Christian R.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

FY14 Codesign Milestone Summary

Hoekstra, Robert J.; Barrett, Richard F.; Howell, Louis; Daniel, David

This milestone was the 2nd in a series of Tri-Lab Co-Design L2 milestones supporting ‘Co-Design’ efforts in the ASC program. It is a crucial step towards evaluating the effectiveness of proxy applications in exploring code performance on next generation architectures. All three labs evaluated the performance of 2 proxy applications on modern architectures and/or testbeds for pre-production hardware. The results are captured in this document as well as annotated presentations from all 3 laboratories.

More Details

TYPE Other Report YEAR 2014

DOI OSTI

Communication Explorations for Adaptive Mesh Refinement Motivated by Emerging and Future Architectures

Vaughan, Courtenay T.; Barrett, Richard F.; Roweth, Duncan

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Experiences with Sandia National Laboratories HPC applications and MPI performance

Rajan, Mahesh; Doerfler, Douglas W.; Barrett, Richard F.; Stevenson, Joel O.; Agelastos, Anthony M.; Shaw, Ryan; Meyer, Harold E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Sandia's ASC Advanced Architecture Test Beds

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Abstract machine models and proxy architectures for exascale computing

Ang, James A.; Barrett, Richard F.; Benner, Robert E.; Burke, D.; Chan, C.; Donofrio, David; Hammond, Simon; Hemmert, Karl S.; Kelly, Suzanne M.; Le, H.; Leung, Vitus J.; Resnick, David R.; Rodrigues, Arun; Shalf, John; Stark, Dylan T.; Unat, Didem; Wright, N.J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI

Mantevo Project update

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Exploring Workloads of Adaptive Mesh Refinement

Vaughan, Courtenay T.; Barrett, Richard F.; Jayaraj, Jagan

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Sequoia CCC Investigation of factors impacting runtime performance at exascale for ASC application codes: A co-design effort

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

MPPM, Viewed as a co-design effort

Proceedings of Co-HPC 2014: 1st International Workshop on Hardware-Software Co-Design for High Performance Computing - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Woodward, Paul R.; Jayaraj, Jagan; Barrett, Richard F.

The Piecewise Parabolic Method (PPM) was designed as a means of exploring compressible gas dynam-ics problems of interest in astrophysics, including super-sonic jets, compressible turbulence, stellar convection, and turbulent mixing and burning of gases in stellar interiors. Over time, the capabilities encapsulated in PPM have co-evolved with the availability of a series of high performance computing platforms. Implementation of the algorithm has adapted to and advanced with the architectural capabilities and characteristics of these machines. This adaptability of our PPM codes has enabled targeted astrophysical applica-tions of PPM to exploit these scarce resources to explore complex physical phenomena. Here we describe the means by which this was accomplished, and set a path forward, with a new miniapp, mPPM, for continuing this process in a diverse and dynamic architecture design environment. Adaptations in mPPM for the latest high performance machines are discussed that address the important issue of limited bandwidth from locally attached main memory to the microprocessor chip.

More Details

TYPE Conference Poster YEAR 2014

OSTI Scopus

Early Experiences Co-Scheduling Work and Communication Tasks for Hybrid MPI+X Applications

Proceedings of ExaMPI 2014: Exascale MPI 2014 - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis

Stark, Dylan T.; Barrett, Richard F.; Grant, Ryan; Olivier, Stephen L.; Foulk, James W.; Vaughan, Courtenay T.

Advances in node-level architecture and interconnect technology needed to reach extreme scale necessitate a reevaluation of long-standing models of computation, in particular bulk synchronous processing. The end of Dennard-scaling and subsequent increases in CPU core counts each successive generation of general purpose processor has made the ability to leverage parallelism for communication an increasingly critical aspect for future extreme-scale application performance. But the use of massive multithreading in combination with MPI is an open research area, with many proposed approaches requiring code changes that can be unfeasible for important large legacy applications already written in MPI. This paper covers the design and initial evaluation of an extension of a massive multithreading runtime system supporting dynamic parallelism to interface with MPI to handle fine-grain parallel communication and communication-computation overlap. Our initial evaluation of the approach uses the ubiquitous stencil computation, in three dimensions, with the halo exchange as the driving example that has a demonstrated tie to real code bases. The preliminary results suggest that even for a very well-studied and balanced workload and message exchange pattern, co-scheduling work and communication tasks is effective at significant levels of decomposition using up to 131,072 cores. Furthermore, we demonstrate useful communication-computation overlap when handling blocking send and receive calls, and show evidence suggesting that we can decrease the burstiness of network traffic, with a corresponding decrease in the rate of stalls (congestion) seen on the host link and network.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Reducing the bulk of the bulk synchronous parallel model

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

For over two decades the dominant means for enabling portable performance of computational science and engineering applications on parallel processing architectures has been the bulk-synchronous parallel programming (BSP) model. Code developers, motivated by performance considerations to minimize the number of messages transmitted, have typically pursued a strategy of aggregating message data into fewer, larger messages. Emerging and future high-performance architectures, especially those seen as targeting Exascale capabilities, provide motivation and capabilities for revisiting this approach. In this paper we explore alternative configurations within the context of a large-scale complex multi-physics application and a proxy that represents its behavior, presenting results that demonstrate some important advantages as the number of processors increases in scale.

More Details

TYPE Journal Article YEAR 2013

OSTI DOI

The Mantevo ProjectTools for Codesign (SC13 BOF)

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

The Mantevo Project : Tools for codesign (PSAAP 2)

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

I don't wanna grow up... stuck at predictive capability maturity model level zero!

Rider, William J.; Kelly, Suzanne M.; Barrett, Richard F.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

NNSA/ASC Test Bed Update

Hammond, Simon; Barrett, Richard F.; Vaughan, Courtenay T.; Trott, Christian R.; Laros, James H.; Kelly, Suzanne M.; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

The Mantevo ProjectMini-applications: Vehicles for Co-Design

Barrett, Richard F.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Applications Thrust update

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Proxy and Proto miniApps for Exascale Co-design

Ang, James A.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

I Don't Wanna Grow Up...Stuck at Predictive Capability Maturity Model Level Zero

Kelly, Suzanne M.; Rider, William J.; Barrett, Richard F.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Proxy and Proto miniApps for Exascale Co-design

Ang, James A.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Extreme-scale Computing Grand Challenge LDRD (XGC)

Hemmert, Karl S.; Barrett, Brian; Barrett, Richard F.; Lentine, Anthony L.; Rodrigues, Arun; Denton-Hill, Kim M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy Implications

Barrett, Richard F.; Hammond, Simon; Hsieh, Mingyu N.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Discussion of some new and developing miniapps

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Using the Cray Gemini Performance Counters

Pedretti, Kevin; Vaughan, Courtenay T.; Barrett, Richard F.; Devine, Karen; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Mantevo Suite 1.0

Barrett, Richard F.; Willenbring, James M.; Hammond, Simon

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

The Mantevo Project :Tools for codesign

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Extreme-scale Computing Grand Challenge (XGC)

Barrett, Brian; Barrett, Richard F.; Rodrigues, Arun; Lentine, Anthony L.; Denton-Hill, Kim M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Assessing the Predictive Capabilities of Miniapps

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Experiences with Xeon Phi

Hammond, Simon; Rajamanickam, Sivasankaran; Ang, James A.; Barrett, Richard F.; Doerfler, Douglas W.; Heroux, Michael A.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Application Explorations for Future Interconnects

Barrett, Richard F.; Vaughan, Courtenay T.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using Miniapplications in a Mantevo Framework for Optimizing Sandia's SPARC CFD Code on Multi-Core Many-Core and GPU-Accelerated Platforms

Barrett, Richard F.; Laros, James H.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Using the Cray Gemini Performance Counters

Pedretti, Kevin; Vaughan, Courtenay T.; Hemmert, Karl S.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Navigating an evolutionary fast path to exascale

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Hammond, Simon; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

The computing community is in the midst of a disruptive architectural change. The advent of manycore and heterogeneous computing nodes forces us to reconsider every aspect of the system software and application stack. To address this challenge there is a broad spectrum of approaches, which we roughly classify as either revolutionary or evolutionary. With the former, the entire code base is re-written, perhaps using a new programming language or execution model. The latter, which is the focus of this work, seeks a piecewise path of effective incremental change. The end effect of our approach will be revolutionary in that the control structure of the application will be markedly different in order to utilize single-instruction multiple-data/thread (SIMD/SIMT), manycore and heterogeneous nodes, but the physics code fragments will be remarkably similar. Our approach is guided by a set of mission driven applications and their proxies, focused on balancing performance potential with the realities of existing application code bases. Although the specifics of this process have not yet converged, we find that there are several important steps that developers of scientific and engineering application programs can take to prepare for making effective use of these challenging platforms. Aiding an evolutionary approach is the recognition that the performance potential of the architectures is, in a meaningful sense, an extension of existing capabilities: vectorization, threading, and a re-visiting of node interconnect capabilities. Therefore, as architectures, programming models, and programming mechanisms continue to evolve, the preparations described herein will provide significant performance benefits on existing and emerging architectures. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus

Assessing the predictive capabilities of mini-applications

Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012

Barrett, Richard F.; Crozier, Paul; Doerfler, Douglas W.; Hammond, Simon; Heroux, Michael A.; Lin, Paul T.; Trucano, Timothy G.; Vaughan, Courtenay T.; Williams, Alan B.

The push to exascale computing is informed by the assumption that the architecture, regardless of the specific design, will be fundamentally different from petascale computers. The Mantevo project has been established to produce a set of proxies, or 'miniapps,' which enable rapid exploration of key performance issues that impact a broad set of scientific applications programs of interest to ASC and the broader HPC community. Understanding the conditions under which a miniapp can be confidently used as predictive of an applications' behavior must be clearly elucidated. Toward this end, we have developed a methodology for assessing the predictive capabilities of application proxies. Adhering to the spirit of experimental validation, our approach provides a framework for examining data from the application with that provided by their proxies. In this poster we present this methodology, and apply it to three miniapps developed by the Mantevo project. © 2012 IEEE.

More Details

TYPE Conference YEAR 2012

OSTI Scopus

Emerging HPC Systems and Next Generation Engineering Analysis Applications

Ang, James A.; Barrett, Richard F.; Hammond, Simon; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Miniapps: Vehicles for CoDesign

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Unprecedented Scalability and Performance of the new NNSA Tri-Lab Capacity Cluster 2 (TLCC2)

Rajan, Mahesh; Doerfler, Douglas W.; Lin, Paul T.; Hammond, Simon; Barrett, Richard F.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Unprecedented Scalability and Performance of the new NNSA Tri-Lab Capacity Cluster 2 (TLCC2)

Doerfler, Douglas W.; Lin, Paul T.; Hammond, Simon; Barrett, Richard F.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Co-Design

Hammond, Simon; Ang, James A.; Barrett, Richard F.; Doerfler, Douglas W.; Heroux, Michael A.; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Exascale Preparations using Miniapps

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Some Explorations using miniGhost

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Intel MIC Architecture

Ang, James A.; Kelly, Suzanne M.; Benner, Robert E.; Hammond, Simon; Barrett, Richard F.; Levenhagen, Michael; Rodrigues, Arun; Pedretti, Kevin; Laros, James H.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Toward Codesign in High Performance Computing Systems

Barrett, Richard F.; Dosanjh, Sudip S.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Miniapplications: a Promising Approach to Improve the Performance of Computational Mechanics Codes

Lin, Paul T.; Heroux, Michael A.; Williams, Alan B.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Assessing the Predictive Capabilities of Miniapps as Proxies for Full Scale Applications

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Next Generation Testbeds

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Heterogeneous Compute

Hammond, Simon; Ang, James A.; Barrett, Richard F.; Laros, James H.; Doerfler, Douglas W.; Heroux, Michael A.; Trott, Christian R.; Crozier, Paul

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Early Experiences with Intel MIC Architcture

Ang, James A.; Hammond, Simon; Barrett, Richard F.; Levenhagen, Michael; Rodrigues, Arun; Pedretti, Kevin; Laros, James H.; Kelly, Suzanne M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Navigating An Evolutionary Fast Path to Exascale

Barrett, Richard F.; Hammond, Simon; Vaughan, Courtenay T.; Doerfler, Douglas W.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE SAND Report YEAR 2012

OSTI

Characterize the Role of the Mini-Applications in Predicting Key Performance Characteristics of Real Applications

Barrett, Richard F.; Doerfler, Douglas W.; Crozier, Paul; Heroux, Michael A.; Lin, Paul T.; Thornquist, Heidi K.; Trucano, Timothy G.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Exascale Design Space Exploration and Co-design

Proposed for publication in Future Generation Computer Systems.

Barrett, Richard F.; Trucano, Timothy G.; Doerfler, Douglas W.; Dosanjh, Sudip S.; Hammond, Simon; Hemmert, Karl S.; Heroux, Michael A.; Lin, Paul T.; Pedretti, Kevin P.; Rodrigues, Arun

Abstract not provided.

More Details

TYPE Journal Article YEAR 2012

OSTI

MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing

Barrett, Richard F.; Vaughan, Courtenay T.; Heroux, Michael A.

A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a 'halo exchange' of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes 'bulk up' these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach. In this report we describe miniGhost, a 'miniapp' designed for exploration of the capabilities of current as well as emerging and future architectures within the context of these sorts of applications. MiniGhost joins the suite of miniapps developed as part of the Mantevo project.

More Details

TYPE SAND Report YEAR 2012

DOI OSTI

Preparing Multi-physics Multi-scale Codes for Exascale Computing

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Overview of the XGC project

Shinde, Subhash L.; Ang, James A.; Barrett, Brian; Barrett, Richard F.; Denton-Hill, Kim M.; Lentine, Anthony L.; Murphy, Richard C.; Rodrigues, Arun; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Demonstration of a Legacy Application's Path to Exascale - ASC L2 Milestone 4467

Barrett, Brian; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron; Pedretti, Kevin P.; Rodrigues, Arun; Barrett, Richard F.; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Report of experiments and evidence for ASC L2 milestone 4467: demonstration of a legacy application's path to exascale

Barrett, Brian; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron; Pedretti, Kevin T.T.; Rodrigues, Arun; Barrett, Richard F.; Thompson, David; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

This report documents thirteen of Sandia's contributions to the Computational Systems and Software Environment (CSSE) within the Advanced Simulation and Computing (ASC) program between fiscal years 2009 and 2012. It describes their impact on ASC applications. Most contributions are implemented in lower software levels allowing for application improvement without source code changes. Improvements are identified in such areas as reduced run time, characterizing power usage, and Input/Output (I/O). Other experiments are more forward looking, demonstrating potential bottlenecks using mini-application versions of the legacy codes and simulating their network activity on Exascale-class hardware. The purpose of this report is to prove that the team has completed milestone 4467-Demonstration of a Legacy Application's Path to Exascale. Cielo is expected to be the last capability system on which existing ASC codes can run without significant modifications. This assertion will be tested to determine where the breaking point is for an existing highly scalable application. The goal is to stretch the performance boundaries of the application by applying recent CSSE RD in areas such as resilience, power, I/O, visualization services, SMARTMAP, lightweight LWKs, virtualization, simulation, and feedback loops. Dedicated system time reservations and/or CCC allocations will be used to quantify the impact of system-level changes to extend the life and performance of the ASC code base. Finally, a simulation of anticipated exascale-class hardware will be performed using SST to supplement the calculations. Determine where the breaking point is for an existing highly scalable application: Chapter 15 presented the CSSE work that sought to identify the breaking point in two ASC legacy applications-Charon and CTH. Their mini-app versions were also employed to complete the task. There is no single breaking point as more than one issue was found with the two codes. The results were that applications can expect to encounter performance issues related to the computing environment, system software, and algorithms. Careful profiling of runtime performance will be needed to identify the source of an issue, in strong combination with knowledge of system software and application source code.

More Details

TYPE SAND Report YEAR 2012

DOI OSTI

Using a Miniapp to Explore Computational Strategies for Multi-Core Machines

Vaughan, Courtenay T.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Copy of Mini-applications: Vehicles for Co-Design

Barrett, Richard F.; Heroux, Michael A.; Lin, Paul T.; Vaughan, Courtenay T.; Williams, Alan B.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Mini-Applications:Tools for Co-Design

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Application-Driven Analysis of Two Generations of Capability Computing Platforms: The Transition to Multicore Processors

Concurreny and Computation: Practice and Experience

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Barrett, Richard F.; Lin, Paul T.; Pedretti, Kevin; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2011

OSTI

Mini-applications: Vehicles for Co-Design

Barrett, Richard F.; Heroux, Michael A.; Lin, Paul T.; Vaughan, Courtenay T.; Williams, Alan B.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Application Driven Analysis of Two Generations of Capability Computing Platforms: Purple and Cielo

Rajan, Mahesh; Vaughan, Courtenay T.; Barrett, Richard F.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Preparing Multi-physics Multi-scale Codes for Exascale HPC

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

Miniapps: A Tool for CoDesign (Sedona meeting)

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

Programming models discussion

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

From Red Storm to Cielo: Performance Analysis of ASC Simulation Programs Across an Evolution of Multicore Architectures

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Rajan, Mahesh; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Journal Article YEAR 2011

OSTI

Investigating the Impact of the Cielo Cray XT6 Architecture on Scientific Application Codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A Comparison of the Performance Characteristics of Capability and Capacity Class HPC Systems

Doerfler, Douglas W.; Rajan, Mahesh; Epperson, Marcus; Vaughan, Courtenay T.; Pedretti, Kevin; Barrett, Richard F.; Barrett, Brian

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Preparing Multi-physics Multi-scale Codes for Hybrid HPC

Barrett, Richard F.; Drake, Richard R.; Robinson, Allen C.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Understanding the Performance of Scientific Simulation Programs on High Performance Computers using Proxies

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

Copy of Investigating the Impact of the Cielo Cray XE6 Architecture on Scientific Application Codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Application-driven Analysis of Two Generations of Capability Computing Platforms: Purple and Cielo

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin; Barrett, Richard F.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Investigating the impact of the cielo cray XE6 architecture on scientific application codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

More Details

TYPE Conference YEAR 2010

OSTI

Some Computational Science Experiences

Barrett, Richard F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Publications

Search results