Publications Search

Proceedings of P3HPC 2019: International Workshop on Performance, Portability and Productivity in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Hollman, David S.; Lelbach, Bryce; Edwards, H.C.; Hoemmen, Mark F.; Sunderland, Daniel S.; Trott, Christian R.

Multi-dimensional arrays are ubiquitous in high-performance computing (HPC), but their absence from the C++ language standard is a long-standing and well-known limitation of their use for HPC. This paper describes the design and implementation of mdspan, a proposed C++ standard multidimensional array view (planned for inclusion in C++23). The proposal is largely inspired by work done in the Kokkos project - a C++ performance-portable programming model de- ployed by numerous HPC institutions to prepare their code base for exascale-class supercomputing systems. This paper describes the final design of mdspan af- ter a five-year process to achieve consensus in the C++ community. In particular, we will lay out how the design addresses some of the core challenges of performance-portable programming, and how its cus- tomization points allow a seamless extension into areas not currently addressed by the C++ Standard but which are of critical importance in the heterogeneous computing world of today's systems. Finally, we have provided a production-quality implementation of the proposal in its current form. This work includes several benchmarks of this implementation aimed at demon- strating the zero-overhead nature of the modern design.

More Details

TYPE Conference Poster YEAR 2019

Scopus OSTI

mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Hollman, David S.; Lelbach, Bryce L.; Edwards, H.C.; Hoemmen, Mark F.; Sunderland, Daniel S.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI DOI

Purging reliance on UVM from Tpetra & downstream solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

P1673: A proposal for a C++ Standard linear algebra library

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Performance in Matrix Assembly Using Trilinos

Fuller, Timothy J.; Hoemmen, Mark F.; McLendon, William C.; Siefert, Christopher S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Progress Towards a Performance-Portable SIERRA/Aria

Brunini, Victor B.; Clausen, Jonathan C.; Hoemmen, Mark F.; Kucala, Alec K.; Phillips, Malachi P.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evolving a Standard C++ Linear Algebra Library from the BLAS

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

A free function linear algebra interface based on the BLAS

Caday, Peter C.; Hoemmen, Mark F.; Hollman, David S.; Liber, Nevin L.; Lo, Li-Ta L.; Lopez, Graham L.; Luszczek, Piotr L.; Knepper, Sarah K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Evolving a standard C++ linear algebra library

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

The latest in Tpetra: Trilinos? parallel sparse linear algebra

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Communication-avoiding & pipelined Krylov solvers in Trilinos

Yamazaki, Ichitaro Y.; Hoemmen, Mark F.; Boman, Erik G.; Dongarra, Jack D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Communication-avoiding and pipelined Krylov solvers in Trilinos

Hoemmen, Mark F.; Yamazaki, Ichitaro Y.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Progress Towards a Performance-Portable SIERRA/Aria

Brunini, Victor B.; Clausen, Jonathan C.; Hoemmen, Mark F.; Kucala, Alec K.; Phillips, Malachi P.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Modern C++ in Computational Science

Hollman, David S.; Hoemmen, Mark F.; Sunderland, Daniel S.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

P1417: Historical lessons for C++ linear algebra library standardization

Hoemmen, Mark F.; Badwaik, Jayesh B.; Brucher, Matthieu B.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

P1417: Lessons learned for linear algebra library standardization

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Runtime Polymorphism in Kokkos Applications

Brunini, Victor B.; Clausen, Jonathan C.; Hoemmen, Mark F.; Kucala, Alec K.; Trott, Christian R.; Howard, Micah A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Deprecating and removing most of Tpetra's template parameters

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

WBS STPR 04 Milestone 4 Report

Sunderland, Daniel S.; Hoemmen, Mark F.; Trott, Christian R.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

WBS STPR 04 Milestone 4 Report

Trott, Christian R.; Sunderland, Daniel S.; Hoemmen, Mark F.

This report documents the completion of milestone STPRO4-4 Kokkos back-ends research, collaborations, development, optimization, and documentation. The Kokkos team updated its existing backend to support the software stack and hardware of DOE's Sierra, Summit and Astra machines. They also collaborated with ECP PathForward vendors on developing backends for possible exa-scale architectures. Furthermore, the team ramped up its engagement with the ISO/C++ committee to accelerate the adoption of features important for the HPC community into the C++ standard.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Migrating a production multiphysics finite element code to next-generation and heterogeneous architectures using the Kokkos abstraction layer

Clausen, Jonathan C.; Brunini, Victor B.; Hoemmen, Mark F.; Noble, David R.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Thread Parallel Message Packing for Sparse Matrix MPI Communication

Fuller, Timothy J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Employing Multiple Levels of Parallelism for CFD at Large Scales on Next Generation High-Performance Computing Platforms

Howard, Micah A.; Fisher, Travis C.; Hoemmen, Mark F.; Dinzl, Derek J.; Overfelt, James R.; Bradley, Andrew M.; Kim, Kyungjoo K.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

You get the pointer but only on my terms: Control local data access for better use of modern computer hardware and programming models

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Production implementations of pipelined and communication-avoiding iterative solvers

Yamazaki, Ichitaro Y.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Threaded Assembly in Aria Expressions

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Chris F.; Noble, David R.; Hoemmen, Mark F.; Hammond, Simon D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Production-Ready Exascale-Enabled Krylov Solvers

Boman, Erik G.; Hoemmen, Mark F.; Anzt, Hartwig A.; Dongarra, Jack D.; Yamazaki, Ichitaro Y.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Performance Portability in SPARC

Overfelt, James R.; Howard, Micah A.; Bradley, Andrew M.; Bova, S.W.; Fisher, Travis C.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

You get the pointer but only on MY terms

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

KokkosKernels: Performance-Portable Sparse Dense and Graph Kernels

Rajamanickam, Sivasankaran R.; Bradley, Andrew M.; Deveci, Mehmet D.; Hoemmen, Mark F.; Hammond, Simon D.; Kim, Kyungjoo K.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

How to configure and build Trilinos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

1541 L2 Milestone: Thread Scalable Expression Assembly in Aria

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon D.; Hoemmen, Mark F.; Lin, Paul L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Prototyping the Next Generation of Aria

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Christopher J.; Noble, David R.; Trott, Christian R.; Hammond, Simon D.; Hoemmen, Mark F.; Lin, Paul L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Improving performance of GMRES by reducing communication and pipelining global collectives

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Yamazaki, Ichitaro; Hoemmen, Mark F.; Luszczek, Piotr; Dongarra, Jack

We compare the performance of pipelined and s-step GMRES, respectively referred to as l-GMRES and s-GMRES, on distributed multicore CPUs. Compared to standard GMRES, s-GMRES requires fewer all-reduces, while l-GMRES overlaps the all-reduces with computation. To combine the best features of two algorithms, we propose another variant, (l, t)-GMRES, that not only does fewer global all-reduces than standard GMRES, but also overlaps those all-reduces with other work. We implemented the thread-parallelism and communication-overlap in two different ways. The first uses nonblocking MPI collectives with thread-parallel computational kernels. The second relies on a shared-memory task scheduler. In our experiments, (l, t)-GMRES performed better than l-GMRES by factors of up to 1.67×. In addition, though we only used 50 nodes, when the latency cost became significant, our variant performed up to 1.22× better than s-GMRES by hiding all-reduces.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Towards a Performance Portable Compressible CFD Code

Howard, Micah A.; Bradley, Andrew M.; Bova, S.W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Thread parallelism in Trilinos' sparse linear algebra interfaces & linear solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Enabling Low Mach Fluid Simulations Using Trilinos

Hu, Jonathan J.; Devine, Karen D.; Hoemmen, Mark F.; Lin, Paul L.; Rajamanickam, Sivasankaran R.; Roberts, Nathan V.; Siefert, Christopher S.; Trott, Christian R.; Prokopenko, Andrey P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Using Kokkos for Performance Portability of the Tpetra Sparse Linear Algebra Library on Intel KNL and NVIDIA GPUs

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Prototyping the Next-Generation of Aria

Brunini, Victor B.; Clausen, Jonathan C.; Noble, David R.; Forster, Christopher J.; Trott, Christian R.; Hammond, Simon D.; Hoemmen, Mark F.; Lin, Paul L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Towards a performance portable compressible CFD code

23rd AIAA Computational Fluid Dynamics Conference, 2017

Howard, Micah A.; Bradley, Andrew M.; Bova, S.W.; Overfelt, James R.; Wagnild, Ross M.; Dinzl, Derek J.; Hoemmen, Mark F.; Klinvex, Alicia M.

High performance computing (HPC) is undergoing a dramatic change in computing architectures. Nextgeneration HPC systems are being based primarily on many-core processing units and general purpose graphics processing units (GPUs). A computing node on a next-generation system can be, and in practice is, heterogeneous in nature, involving multiple memory spaces and multiple execution spaces. This presents a challenge for the development of application codes that wish to compute at the extreme scales afforded by these next-generation HPC technologies and systems - the best parallel programming model for one system is not necessarily the best parallel programming model for another. This inevitably raises the following question: how does an application code achieve high performance on disparate computing architectures without having entirely different, or at least significantly different, code paths, one for each architecture? This question has given rise to the term ‘performance portability’, a notion concerned with porting application code performance from architecture to architecture using a single code base. In this paper, we present the work being done at Sandia National Labs to develop a performance portable compressible CFD code that is targeting the ‘leadership’ class supercomputers the National Nuclear Security Administration (NNSA) is acquiring over the course of the next decade.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI

Summary of current thread parallelization efforts in Trilinos' linear algebra and solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Threads & CUDA status of Tpetra & downstream solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Trilinos NGP Planning

Rajamanickam, Sivasankaran R.; Devine, Karen D.; Hu, Jonathan J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

KokkosKernels Introduction: Design API and Performance

Deveci, Mehmet D.; Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Bradley, Andrew M.; Trott, Christian R.; Hoemmen, Mark F.; Boman, Erik G.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Kokkos Technical Review Slides and Discussion Notes

Edwards, Harold C.; Sunderland, Daniel S.; Hoemmen, Mark F.; Ellingwood, Nathan D.; Trott, Christian R.; Mackey, Greg

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Anasazi & Belos tutorial

Klinvex, Alicia M.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Epetra & Tpetra (Sparse linear algebra) overview

Hoemmen, Mark F.; Klinvex, Alicia M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

How to build Trilinos

Hoemmen, Mark F.; Klinvex, Alicia M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

An overview of Trilinos

Hoemmen, Mark F.; Klinvex, Alicia M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Dynamical System for Resilient Computing

Rothganger, Fredrick R.; Hoemmen, Mark F.; Phipps, Eric T.; Warrender, Christina E.

Abstract not provided.

More Details

TYPE Other Report YEAR 2016

OSTI DOI

Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Eberhardt, Ryan; Hoemmen, Mark F.

We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block substructure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. Previous research on SpMV for matrices with dense block substructure has largely focused on the design of novel data structures to optimize performance for specific architectures or to store variable-sized, variably-aligned blocks, but depending on alternate storage formats breaks compatibility with existing preconditioners and solvers or imposes significant runtime costs when converting between matrix formats. This paper instead focuses on the optimization of SpMV using the standard block compressed row storage (BCSR) format. We give a set of algorithms that performs SpMV up to 4x faster than the NVIDIA cuSPARSE cusparseDbsrmv routine, up to 147x faster than the Intel Math Kernel Library (MKL) mkl-dbsrmv routine (a single-threaded BCSR SpMV kernel), and up to 3x faster than the MKL mkl-dcsrmv routine (a multi-threaded CSR SpMV kernel).

More Details

TYPE Conference Poster YEAR 2016

Scopus OSTI

Optimization of block sparse matrix-vector multiplication on shared-memory architectures

Eberhardt, Ryan E.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI DOI

Ifpack2 User's Guide 1.0

Prokopenko, Andrey V.; Siefert, Christopher S.; Hu, Jonathan J.; Hoemmen, Mark F.; Klinvex, Alicia M.

This is the definitive user manual for the I FPACK 2 package in the Trilinos project. I FPACK 2 pro- vides implementations of iterative algorithms (e.g., Jacobi, SOR, additive Schwarz) and processor- based incomplete factorizations. I FPACK 2 is part of the Trilinos T PETRA solver stack, is templated on index, scalar, and node types, and leverages node-level parallelism indirectly through its use of T PETRA kernels. I FPACK 2 can be used to solve to matrix systems with greater than 2 billion rows (using 64-bit indices). Any options not documented in this manual should be considered strictly experimental .

More Details

TYPE SAND Report YEAR 2016

OSTI DOI

Thread parallelism in sparse linear algebra and iterative solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Gradually porting an in-use sparse matrix library to use CUDA

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

What Error to Expect When You Are Expecting a Bit Flip

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Embedded Ensemble Propagation for Improving Performance Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

Phipps, Eric T.; D'Elia, Marta D.; Edwards, Harold C.; Hoemmen, Mark F.; Hu, Jonathan J.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance Portability for Linear Algebra with Kokkos

Trott, Christian R.; Edwards, Harold C.; Ellingwood, Nathan D.; Hammond, Simon D.; Deveci, Mehmet D.; Boman, Erik G.; Bradley, Andrew M.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Getting the right answer despite incorrect hardware

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Scalable Linear Algebra Capability Area update

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran R.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stan T.; Dongarra, Jack D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Optimal adiabatic scaling and the processor-in-memory-and-storage architecture (OAS+PIMS)

Proceedings of the 2015 IEEE/ACM International Symposium on Nanoscale Architectures, NANOARCH 2015

DeBenedictis, Erik; Cook, Jeanine C.; Hoemmen, Mark F.; Metodi, Tzvetan S.

We discuss a new approach to computing that retains the possibility of exponential growth while making substantial use of the existing technology. The exponential improvement path of Moore's Law has been the driver behind the computing approach of Turing, von Neumann, and FORTRAN-like languages. Performance growth is slowing at the system level, even though further exponential growth should be possible. We propose two technology shifts as a remedy, the first being the formulation of a scaling rule for scaling into the third dimension. This involves use of circuit-level energy efficiency increases using adiabatic circuits to avoid overheating. However, this scaling rule is incompatible with the von Neumann architecture. The second technology shift is a computer architecture and programming change to an extremely aggressive form of Processor-In-Memory (PIM) architecture, which we call Processor-In-Memory-and-Storage (PIMS). Theoretical analysis shows that the PIMS architecture is compatible with the 3D scaling rule, suggesting both immediate benefit and a long-term improvement path.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI

Tpetra Project Overview

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Beyond Moore's Law and Implications for Computing in Space

DeBenedictis, Erik; Cook, Jeanine C.; Metodi, Tzvetan S.; Hoemmen, Mark F.; Marinella, Matthew J.; Schiek, Richard S.; Zima, Hans Z.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

A numerical soft fault model for iterative linear solvers

HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank

We present a fault model designed to bring out the \worst" in iterative solvers based on mathematical properties. Our model introduces substantially higher overhead, but smaller variance, than a fault model based on random bit ips. We also relate the statistics from our experiments back to the solvers' conffguration, and briey address the computational efiort that each model requires. Our approach requires signi ficantly fewer resources, while punishing our solvers with undetectable errors that require notable overhead for recovery. This work also illustrates the robustness of our resilient algorithms: Not only do we make forward progress in the presence of pathological faults, we always obtain the correct answer.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI

Preconditioning Communication-Avoiding Krylov Methods

Rajamanickam, Sivasankaran R.; Yamazaki, Ichitaro Y.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire T.; Dongarra, Jack D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

The cost of reliability: Iterative linear solvers and reactive fault tolerance

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Progress report on MPI+X computational kernels for algebraic multigrid and iterative linear solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Exploiting data representation for fault tolerance

Journal of Computational Science

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank M.

Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

More Details

TYPE Journal Article YEAR 2015

OSTI DOI

Towards extreme-scale simulations for low mach fluids with second-generation trilinos

Parallel Processing Letters

Lin, Paul L.; Bettencourt, Matthew T.; Domino, Stefan P.; Fisher, Travis C.; Hoemmen, Mark F.; Hu, Jonathan J.; Phipps, Eric T.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran R.; Siefert, Christopher S.; Kennon, Stephen

Trilinos is an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. While Trilinos was originally designed for scalable solutions of large problems, the fidelity needed by many simulations is significantly greater than what one could have envisioned two decades ago. When problem sizes exceed a billion elements even scalable applications and solver stacks require a complete revision. The second-generation Trilinos employs C++ templates in order to solve arbitrarily large problems. We present a case study of the integration of Trilinos with a low Mach fluids engineering application (SIERRA low Mach module/Nalu). Through the use of improved algorithms and better software engineering practices, we demonstrate good weak scaling for up to a nine billion element large eddy simulation (LES) problem on unstructured meshes with a 27 billion row matrix on 524,288 cores of an IBM Blue Gene/Q platform.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Skeptical Programming and Selective Reliability

Elliott, James J.; Hoemmen, Mark F.; Mueller, Frank M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Trilinos BlockCRS solver stack in high-order aerodynamics simulations

Fisher, Travis C.; Siefert, Christopher S.; Hoemmen, Mark F.; Prokopenko, Andrey P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Threaded construction and fill of Tpetra sparse linear system using Kokkos

Hoemmen, Mark F.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

An overview of Trilinos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Resilient iterative linear solvers via skeptical programming

Hoemmen, Mark F.; Elliott, James J.; Mueller, Frank M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Resilient Iterative Linear Solvers

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Migrating to Kokkos

Trott, Christian R.; Edwards, Harold C.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Towards Extreme-scale Simulations with Next-Generation Trilinos: a low Mach application case study

Lin, Paul L.; Siefert, Christopher S.; Cyr, Eric C.; Bettencourt, Matthew T.; Domino, Stefan P.; Fisher, Travis C.; Hoemmen, Mark F.; Hu, Jonathan J.; Phipps, Eric T.; Prokopenko, Andrey V.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI DOI

Migrating to Kokkos

Trott, Christian R.; Hoemmen, Mark F.; Edwards, Harold C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on Distributed GPUs

Boman, Erik G.; Heroux, Michael A.; Hoemmen, Mark F.; Rajamanickam, Sivasankaran R.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Tolerating Silent Data Corruption in Opaque Preconditioners

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Trilinos' MPI+X - friendly sparse linear algebra interface

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Fault-tolerant algorithms minisymposium at SIAM Parallel Processing

SIAM (Society of Industrial and Applied Mathematics) News

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2014

OSTI

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Yamazaki, Ichitaro; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Hoemmen, Mark F.; Heroux, Michael A.; Tomov, Stanimire

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Evaluating the impact of SDC on the GMRES iterative solver

Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS

Elliott, James; Hoemmen, Mark F.; Mueller, Frank

Increasing parallelism and transistor density, along with increasingly tighter energy and peak power constraints, may force exposure of occasionally incorrect computation or storage to application codes. Silent data corruption (SDC) will likely be infrequent, yet one SDC suffices to make numerical algorithms like iterative linear solvers cease progress towards the correct answer. Thus, we focus on resilience of the iterative linear solver GMRES to a single transient SDC. We derive inexpensive checks to detect the effects of an SDC in GMRES that work for a more general SDC model than presuming a bit flip. Our experiments show that when GMRES is used as the inner solver of an inner-outer iteration, it can 'run through' SDC of almost any magnitude in the computationally intensive orthogonalization phase. That is, it gets the right answer using faulty data without any required roll back. Those SDCs which it cannot run through, get caught by our detection scheme. © 2014 IEEE.

More Details

TYPE Conference YEAR 2014

Scopus OSTI

Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Hoemmen, Mark F.; Elliott, James J.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Exploiting Data Representation for Fault Tolerance

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Enabling extreme-scale simulations with next-generation Trilinos for Sierra low Mach fluid application code

Lin, Paul L.; Rajamanickam, Sivasankaran R.; Siefert, Christopher S.; Bettencourt, Matthew T.; Cyr, Eric C.; Domino, Stefan P.; Fisher, Travis C.; Hoemmen, Mark F.; Hu, Jonathan J.; Phipps, Eric T.; Prokopenko, Andrey V.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Studying the performance of CA-GMRES on multicores with multiple GPUs

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Quantifying the Impact of Single Bit Flips on GMRES

Elliott, James J.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Recent trends in numerical linear algebra: Avoiding communication and tolerating computer hardware faults

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Fault-tolerant solvers via algorithm / system codesign

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Next-generation programming models: What we need and do not need

Hoemmen, Mark F.; Heroux, Michael A.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Fault-tolerant solvers via algorithm / system codesign

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Using Miniapplications in a Mantevo Framework for Optimizing Sandia's SPARC CFD Code on Multi-Core Many-Core and GPU-Accelerated Compute Platforms

Bettencourt, Matthew T.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Cooperative application/OS DRAM fault recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Exascale systems will present considerable fault-tolerance challenges to applications and system software. These systems are expected to suffer several hard and soft errors per day. Unfortunately, many fault-tolerance methods in use, such as rollback recovery, are unsuitable for many expected errors, for example DRAM failures. As a result, applications will need to address these resilience challenges to more effectively utilize future systems. In this paper, we describe work on a cross-layer application/OS framework to handle uncorrected memory errors. We illustrate the use of this framework through its integration with a new fault-tolerant iterative solver within the Trilinos library, and present initial convergence results.

More Details

TYPE SAND Report YEAR 2012

OSTI DOI

Proposal for a future-scalable linear algebra interface for Krylov methods

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Effective and Efficient Handling of Ill - Conditioned Correlation Matrices in Kriging and Gradient Enhanced Kriging Emulators Through Pivoted Cholesky Factorization

Dalbey, Keith D.; Day, David M.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Fault-tolerant iterative methods via selective reliability

Ferreira, Kurt; Heroux, Michael A.; Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Communication-Avoiding GMRES Implementation Issues

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Fault-tolerant iterative methods via selective reliability

Hoemmen, Mark F.; Heroux, Michael A.; Ferreira, Kurt

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Copy of Next-generation iterative solvers for next-generation computing: Anasazi and Belos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Next-generation iterative solvers for next-generation computing: Anasazi and Belos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A Tutorial on Anasazi and Belos

Thornquist, Heidi K.; Hoemmen, Mark F.; Heroux, Michael A.; Lehoucq, Richard B.; Parks, Michael L.; Day, David M.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

An overview of Trilinos

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Architecture-aware algorithms for extreme-scale computing

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Cooperative Application/OS DRAM Fault Recovery

Hoemmen, Mark F.; Ferreira, Kurt; Heroux, Michael A.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding hybrid-parallel rank-revealing orthogonalization

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding hybrid-parallel rank-revealing orthogonalization method

Hoemmen, Mark F.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method

Hoemmen, Mark F.

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches for orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.

More Details

TYPE Conference YEAR 2010

OSTI