Publications

Results 51–100 of 266

Evaluation of Hardware-Based MPI Acceleration on Astra

Aguilar, Michael J.; Pedretti, Kevin P.; Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

SST_GPU: An Execution -Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model

Khairy, Mahmoud K.; Zhang, Mengchi Z.; Green, Roland G.; Hammond, Simon D.; Hoekstra, Robert J.; Rogers, Timothy R.; Hughes, Clayton H.

Programmable accelerators have become commonplace in modern computing systems. Advances in programming models and the availability of massive amounts of data have created a space for massively parallel acceleration where the context for thousands of concurrent threads are resident on-chip. These threads are grouped and interleaved on a cycle-by-cycle basis among several mas- sively parallel computing cores. The design of future supercomputers relies on an ability to model the performance of these massively parallel cores at scale. To address the need for a scalable, decentralized GPU model that can model large GPUs, chiplet- based GPUs and multi-node GPUs, this report details the first steps in integrating the open-source, execution driven GPGPU-Sim into the SST framework. The first stage of this project, creates two elements: a kernel scheduler SST element accepts work from SST CPU models and schedules it to an SM-collection element that performs cycle-by-cycle timing using SSTs Mem Hierarchy to model a flexible memory system.

More Details

TYPE SAND Report YEAR 2019

OSTI DOI

Analyzing Build System Pressure for Large Applications: Experience from the US DOE ASC Program

Lin, Paul L.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Vanguard Astra: A Prototype Petascale Arm Supercomputer

Hughes, Clayton H.; Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Younge, Andrew J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Vanguard Astra: A Prototype Petascale Arm Supercomputer

Hughes, Clayton H.; Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Younge, Andrew J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Enforcing Fairness in Disaggregated Non-Volatile Memory Systems

Kommareddy, Vamsee R.; Awad, Amro A.; Hughes, Clayton H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Sandia ATDM DevOps and Performance Analysis

Hoekstra, Robert J.; Bartlett, Roscoe B.; Hammond, Simon D.; Cook, Jeanine C.; Dinge, Dennis D.; Frye, Joseph R.; Hughes, Clayton H.; Lin, Paul L.; Vaughan, Courtenay T.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Astra

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Alvin, Kenneth F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

The Astra Supercomputer

Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Pedretti, Kevin P.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Evaluating the intel skylake xeon processor for HPC workloads

Proceedings - 2018 International Conference on High Performance Computing and Simulation, HPCS 2018

Hammond, Simon D.; Vaughan, Courtenay T.; Hughes, Clayton H.

Despite significant advances in the porting of scientific applications to novel architectures such as compute-optimized graphics processors, many-core processor/accelerators and, even special-purpose function units, the vast majority of scientific calculations are still performed on high-performance, commodity server processors. Even in the cases of applications which have been ported to new architectures, frequent serial sections still require strong server-class processor cores to compute as fast as possible. In this paper we report on a set of benchmark studies which evaluate Intel's latest Skylake Xeon server processor. Skylake represents a significant change in the Xeon product line with wider SIMD vector units, a redesigned cache architecture, and, an increased number of memory channels. The wider vector units provide 2x improvement for some compute-intensive applications and the combined memory changes can provide close to 2x the memory bandwidth. We evaluate these new hardware features on several HPC-relevant mini-Applications and benchmarks, including, STREAM, LULESH, XSBench, HPCG and SW4Lite. Together, the new hardware functions provide up to 1.8x speedup on HPC benchmark codes when compared with the previous generation Haswell processor core, providing much greater utility to a broader range of HPC applications that rely on this class of compute node.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI DOI

Exploring Allocation Policies in Disaggregated Non-Volatile Memories

Kommareddy, Vamsee R.; Awad, Amro A.; Hughes, Clayton H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Younge, Andrew J.; Laros, James H.; Hammond, Simon D.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard L2 Milestone Review

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Towards Lightweight and Scalable Simulation of Large-Scale OpenSHMEM Applications

Levenhagen, Michael J.; Hammond, Simon D.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Analyzing Build System Pressure for the ASC Program

Hammond, Simon D.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

FY18 L2 Milestone #8759 Report: Vanguard Astra and ATSE ? an ARM-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

Optimizing for KNL usage modes when data doesn’t fit in MCDRAM

ACM International Conference Proceeding Series

Butcher, Neil; Olivier, Stephen L.; Berry, Jonathan W.; Hammond, Simon D.; Kogge, Peter M.

Technologies such as Multi-Channel DRAM (MCDRAM) or High Bandwidth Memory (HBM) provide significantly more bandwidth than conventional memory. This trend has raised questions about how applications should manage data transfers between levels. This paper focuses on evaluating different usage modes of the MCDRAM in Intel Knights Landing (KNL) manycore processors. We evaluate these usage modes with a sorting kernel and a sorting-based streaming benchmark. We develop a performance model for the benchmark and use experimental evidence to demonstrate the correctness of the model. The model projects near-optimal numbers of copy threads for memory bandwidth bound computations. We demonstrate on KNL up to a 1.9X speedup for sort when the problem does not fit in MCDRAM over an OpenMP GNU sort that does not use MCDRAM.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI

Optimizing for KNL Usage Modes When Data Doesn?t Fit in MCDRAM

Berry, Jonathan W.; Butcher, Neil; Olivier, Stephen L.; Hammond, Simon D.; Kogge, Peter M.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems

Kommareddy, Vamsee R.; Hughes, Clayton H.; Hammond, Simon D.; Awad, Amro A.

Many modern applications have memory footprints that are increasingly large, driving system memory capacities higher and higher. Moreover, these systems are often organized where the bulk of the memory is collocated with the compute capability, which necessitates the need for message passing APIs to facilitate information sharing between compute nodes. Due to the diversity of applications that must run on High-Performance Computing (HPC) systems, the memory utiliza- tion can fluctuate wildly from one application to another. And, because memory is located in the node, maintenance can become problematic because each node must be taken offline and upgraded individually. To address these issues, vendors are exploring disaggregated, memory-centric, systems. In this type of organization, there are discrete nodes,reserved solely for memory, which are shared across many compute nodes. Due to their capacity, low-power, and non-volatility, Non-Volatile Memories (NVMs) are ideal candidates for these memory nodes. This report discusses a new component for the Structural Simulation Toolkit (SST), Opal, that can be used to study the impact of using NVMs in a disaggregated system in terms of performance, security, and memory management. This page intentionally left blank.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

Building 725 Astra and Vanguard

Lacy, Susan L.; Noe, John P.; Ogden, Jeffry B.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

Evaluating the Intel Skylake Xeon Processor for HPC Workloads

Hughes, Clayton H.; Vaughan, Courtenay T.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard Astra - Petascale ARM Platform for U.S. DOE/ASC Supercomputing

Younge, Andrew J.; Pedretti, Kevin P.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Application Performance Insights via System Monitoring

Brandt, James M.; Gentile, Ann C.; Hammond, Simon D.; Cook, Jeanine C.; Allan, Benjamin A.; Tucker, Thomas T.; Naksinehaboon, Nichamon N.; Taerat, Narate T.; Cook, Jonathan C.; Aaziz, Omar R.; Ates, Emre A.; Tuncer, Ozan T.; Egele, Manuel E.; Turk, Ata T.; Coskun, Ayse K.; izadpanah, ramin i.; Dechev, Damian D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Profiling and Debugging Support for the Kokkos Programming Model

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Sunderland, Daniel S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

On the Use of Vectorization in Production Engineering Workloads

Vaughan, Courtenay T.; Cook, Jeanine C.; Benner, R.E.; Dinge, Dennis D.; Lin, Paul L.; Hughes, Clayton H.; Hoekstra, Robert J.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Sandia Laboratories Comanche Collaboration (System Administration and Operations)

Aguilar, Michael J.; Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Profiling and Debugging for the Kokkos Programming Model

Hammond, Simon D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

Vanguard Astra: Maturing the ARM Software Ecosystem for U.S. DOE/ASC Supercomputing

Pedretti, Kevin P.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vanguard Astra: Maturing the ARM Software Ecosystem for U.S. DOE/ASC Supercomputing

Pedretti, Kevin P.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

The Vanguard Advanced Tri-lab Software Environment (ATSE) Project

Pedretti, Kevin P.; Laros, James H.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Structural Simulation Toolkit (SST) Tutorial

Hammond, Simon D.; Rodrigues, Arun; Voskuilen, Gwendolyn R.; Hemmert, Karl S.; Levenhagen, Michael J.; Hughes, Clayton H.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Profiling and Debugging Support for Performance Portable Programming Models

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Sunderland, Daniel S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

On the Use of Vectorization in Production Engineering Workloads

Vaughan, Courtenay T.; Hammond, Simon D.; Dinge, Dennis D.; Lin, Paul L.; Hughes, Clayton H.; Benner, R.E.; Cook, Jeanine C.; Pase, Douglas M.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

DOE NNSA Vanguard Program

Laros, James H.; Alvin, Kenneth F.; Hoekstra, Robert J.; Pedretti, Kevin P.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures: Algorithms and Experiments

Deveci, Mehmet D.; Hammond, Simon D.; Wolf, Michael W.; Rajamanickam, Sivasankaran R.

Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading highperformance computing architectures — Intel's Knights Landing processor and NVIDIA's Pascal GPU. We describe a data placement method and a chunking-based algorithm for our kernels that exploits the existence of the multiple memory spaces in each hardware platform. We evaluate the performance of these methods w.r.t. standard algorithms using the auto-caching mechanisms Our results show that standard algorithms that exploit cache reuse performed as well as multi-memory-aware algorithms for architectures such as Ki\iLs where the memory subsystems have similar latencies. However, for architectures such as GPUS where memory subsystems differ significantly in both bandwidth and latency, multi-memory-aware methods are crucial for good performance. In addition, our new approaches permit the user to run problems that require larger capacities than the fastest memory of each compute node without depending on the software-managed cache mechanisms.

More Details

TYPE Other Report YEAR 2018

OSTI DOI

Multi-threaded Sparse Matrix Matrix Multiplication with Applications in Scientific Computing and Graph Analytics

Deveci, Mehmet D.; Wolf, Michael W.; Berry, Jonathan W.; Rajamanickam, Sivasankaran R.; Boman, Erik G.; Trott, Christian R.; Hammond, Simon D.; Olivier, Stephen L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Vector-friendly Batched BLAS and LAPACK Kernels : Design and Applications

Rajamanickam, Sivasankaran R.; Kim, Kyungjoo K.; Bradley, Andrew M.; Deveci, Mehmet D.; Trott, Christian R.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

ECP Hardware and Integration - Hardware Evaluation All Hands

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Enhanced Profiling for Kokkos Applications

Hammond, Simon D.; Trott, Christian R.; Ibanez-Granados, Daniel A.; Edwards, Harold C.; Sunderland, Daniel S.; Ellingwood, Nathan D.; Brandt, James M.; Gentile, Ann C.; Cook, Jeanine C.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Continuous Performance Tracking for Kokkos Applications Using LDMS

Brandt, James M.; Hammond, Simon D.; Tucker, Thomas T.; Gentile, Ann C.; Cook, Jeanine C.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Threaded Assembly in Aria Expressions

Clausen, Jonathan C.; Brunini, Victor B.; Forster, Chris F.; Noble, David R.; Hoemmen, Mark F.; Hammond, Simon D.; Trott, Christian R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Interconnect Working Group

Hemmert, Karl S.; Bair, Ray B.; Bhatele, Abhinav B.; Groves, Taylor G.; Hammond, Simon D.; Jain, Nikhil J.; Levenhagen, Michael J.; Mubarak, Misbah M.; Pakin, Scott P.; Ross, Rob R.; Wilke, Jeremiah J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

SST Simulation Framework (and Complex Memory)

Hammond, Simon D.; Hughes, Clayton H.; Awad, Amro A.; Voskuilen, Gwendolyn R.; Rodrigues, Arun; Hemmert, Karl S.; Levenhagen, Michael J.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Analyzing Exascale Memory Architectures Using the SST Toolkit

Hughes, Clayton H.; Awad, Amro A.; Hammond, Simon D.; Rodrigues, Arun; Hemmert, Karl S.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Sandia ATDM Performance Execution Tools & Analysis

Hammond, Simon D.; Vaughan, Courtenay T.; Dinge, Dennis D.; Lin, Paul L.; Benner, R.E.; Hughes, Clayton H.; Trott, Christian R.; Cook, Jeanine C.; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Towards a Scalable Integrated Simulation Framework for Extreme Heterogeneity in High Performance Computing

Hammond, Simon D.; Rodrigues, Arun; Hemmert, Karl S.; Voskuilen, Gwendolyn R.; Hughes, Clayton H.; Levenhagen, Michael J.; Hoekstra, Robert J.; Ang, James A.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

NALU Engineering Application Overview

Hammond, Simon D.; Hoekstra, Robert J.; Rodrigues, Arun; Ang, James A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2018

OSTI

Designing vector-friendly compact BLAS and LAPACK kernels

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017

Kim, Kyungjoo K.; Costa, Timothy B.; Deveci, Mehmet D.; Bradley, Andrew M.; Hammond, Simon D.; Guney, Murat E.; Knepper, Sarah; Story, Shane; Rajamanickam, Sivasankaran R.

Many applications, such as PDE based simulations and machine learning, apply BLAS/LAPACK routines to large groups of small matrices. While existing batched BLAS APIs provide meaningful speedup for this problem type, a non-canonical data layout enabling cross-matrix vectorization may provide further significant speedup. In this paper, we propose a new compact data layout that interleaves matrices in blocks according to the SIMD vector length. We combine this compact data layout with a new interface to BLAS/LAPACK routines that can be used within a hierarchical parallel application. Our layout provides up to 14x, 45x, and 27x speedup against OpenMP loops around optimized DGEMM, DTRSM and DGETRF kernels, respectively, on the Intel Knights Landing architecture. We discuss the compact batched BLAS/LAPACK implementations in two libraries, KokkosKernels and Intel® Math Kernel Library. We demonstrate the APIs in a line solver for coupled PDEs. Finally, we present detailed performance analysis of our kernels.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Towards an Open Source Eco-System for Future HPC Designs

Hammond, Simon D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Results 51–100 of 266

Results 51–100 of 266