Publications Search

The effort to develop larger-scale computing systems introduces a set of related challenges: Large machines are more difficult to synchronize. The sheer quantity of hardware introduces more opportunities for errors. New approaches to hardware, such as low-energy or neuromorphic devices are not directly programmable by traditional methods.

More Details

TYPE Other Report YEAR 2019

DOI OSTI

P1673: A proposal for a C++ Standard linear algebra library

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Trott, Christian R.; Hollman, David S.; Sunderland, Daniel; Hoemmen, Mark F.; Edwards, Carter; Adelstein-Lelbach, Bryce

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Progress Towards a Performance-Portable SIERRA/Aria

Brunini, Victor; Clausen, Jonathan; Hoemmen, Mark F.; Kucala, Alec; Phillips, Malachi; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Performance in Matrix Assembly Using Trilinos

Fuller, Timothy J.; Hoemmen, Mark F.; Mclendon, William; Siefert, Christopher

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A free function linear algebra interface based on the BLAS

Caday, Peter; Hoemmen, Mark F.; Hollman, David S.; Liber, Nevin; Lo, Li-Ta; Lopez, Graham; Luszczek, Piotr; Knepper, Sarah; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evolving a Standard C++ Linear Algebra Library from the BLAS

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evolving a standard C++ linear algebra library

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

The latest in Tpetra: Trilinos? parallel sparse linear algebra

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Communication-avoiding & pipelined Krylov solvers in Trilinos

Yamazaki, Ichitaro; Hoemmen, Mark F.; Boman, Erik G.; Dongarra, Jack

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Progress Towards a Performance-Portable SIERRA/Aria

Brunini, Victor; Clausen, Jonathan; Hoemmen, Mark F.; Kucala, Alec; Phillips, Malachi; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Communication-avoiding and pipelined Krylov solvers in Trilinos

Hoemmen, Mark F.; Yamazaki, Ichitaro

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Modern C++ in Computational Science

Hollman, David S.; Hoemmen, Mark F.; Sunderland, Daniel; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

P1417: Lessons learned for linear algebra library standardization

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

P1417: Historical lessons for C++ linear algebra library standardization

Hoemmen, Mark F.; Badwaik, Jayesh; Brucher, Matthieu

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Runtime Polymorphism in Kokkos Applications

Brunini, Victor; Clausen, Jonathan; Hoemmen, Mark F.; Kucala, Alec; Trott, Christian R.; Howard, Micah

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Deprecating and removing most of Tpetra's template parameters

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

WBS STPR 04 Milestone 4 Report

Trott, Christian R.; Sunderland, Daniel; Hoemmen, Mark F.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

TYPE Other Report YEAR 2018

DOI OSTI

WBS STPR 04 Milestone 4 Report

Sunderland, Daniel; Hoemmen, Mark F.; Trott, Christian R.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

TYPE Other Report YEAR 2018

DOI OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Migrating a production multiphysics finite element code to next-generation and heterogeneous architectures using the Kokkos abstraction layer

Clausen, Jonathan; Brunini, Victor; Hoemmen, Mark F.; Noble, David R.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Thread Parallel Message Packing for Sparse Matrix MPI Communication

Fuller, Timothy J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

You get the pointer but only on my terms: Control local data access for better use of modern computer hardware and programming models

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Production implementations of pipelined and communication-avoiding iterative solvers

Yamazaki, Ichitaro; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Performance Portability in SPARC

Overfelt, James R.; Howard, Micah; Bradley, Andrew M.; Bova, Steven W.; Fisher, Travis C.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Threaded Assembly in Aria Expressions

Clausen, Jonathan; Brunini, Victor; Forster, Chris; Noble, David R.; Hoemmen, Mark F.; Hammond, Simon; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Production-Ready Exascale-Enabled Krylov Solvers

Boman, Erik G.; Hoemmen, Mark F.; Anzt, Hartwig; Dongarra, Jack; Yamazaki, Ichitaro

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

You get the pointer but only on MY terms

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

KokkosKernels: Performance-Portable Sparse Dense and Graph Kernels

Rajamanickam, Sivasankaran; Bradley, Andrew M.; Deveci, Mehmet; Hoemmen, Mark F.; Hammond, Simon; Kim, Kyungjoo; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

How to configure and build Trilinos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

1541 L2 Milestone: Thread Scalable Expression Assembly in Aria

Clausen, Jonathan; Brunini, Victor; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Prototyping the Next Generation of Aria

Clausen, Jonathan; Brunini, Victor; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon; Hoemmen, Mark F.; Lin, Paul T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Improving performance of GMRES by reducing communication and pipelining global collectives

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Yamazaki, Ichitaro; Hoemmen, Mark F.; Luszczek, Piotr; Dongarra, Jack

We compare the performance of pipelined and s-step GMRES, respectively referred to as l-GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s-GMRES requires fewer all-reduces, while l-GMRES overlaps the all-reduces with computation. To combine the best features of two algorithms, we propose another variant, (l, t)-GMRES, that not only does fewer global all-reduces than standard GMRES, but also overlaps those all-reduces with other work. We implemented the thread-parallelism and communication-overlap in two different ways. The first uses nonblocking MPI collectives with thread-parallel computational kernels. The second relies on a shared-memory task scheduler. In our experiments, (l, t)-GMRES performed better than l-GMRES by factors of up to 1.67×. In addition, though we only used 50 nodes, when the latency cost became significant, our variant performed up to 1.22× better than s-GMRES by hiding all-reduces.

More Details

TYPE Conference Poster YEAR 2017

DOI OSTI Scopus

Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures

SIAM Journal on Scientific Computing

Phipps, Eric T.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran

In this study, quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).

More Details

TYPE Journal Article YEAR 2017

DOI OSTI

Thread parallelism in Trilinos' sparse linear algebra interfaces & linear solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Enabling Low Mach Fluid Simulations Using Trilinos

Hu, Jonathan J.; Devine, Karen; Hoemmen, Mark F.; Lin, Paul T.; Rajamanickam, Sivasankaran; Roberts, Nathan V.; Siefert, Christopher; Trott, Christian R.; Prokopenko, Andrey

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Using Kokkos for Performance Portability of the Tpetra Sparse Linear Algebra Library on Intel KNL and NVIDIA GPUs

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah; Bradley, Andrew M.; Bova, Steven W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah; Bradley, Andrew M.; Bova, Steven W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

TYPE Conference Poster YEAR 2017

OSTI Scopus