Publications

45 Results

Processing Particle Data Flows with SmartNICs

Liu, Jianshen L.; Maltzahn, Carlos M.; Curry, Matthew L.; Ulmer, Craig D.

Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data-flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2’s (de)compression hardware can have a significant impact on in-transit workflows where data must be unpacked, processed, and repacked.

More Details

TYPE Other Report YEAR 2022

OSTI DOI

Leveraging SmartNICs in Data Management Tasks for High-Performance Computing

Ulmer, Craig D.; Curry, Matthew L.; Maltzahn, Carlos M.; Liu, Jianshen L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2021

OSTI

Performance Characteristics of the BlueField-2 SmartNIC

Liu, Jianshen L.; Maltzahn, Carlos M.; Ulmer, Craig D.; Curry, Matthew L.

High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics (HPDA) communities in understanding the roles these devices may play in the data paths of upcoming systems. This paper focuses on characterizing both the networking and computing aspects of NVIDIA’s new BlueField-2 SmartNIC when used in a 100Gb/s Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much effort is required to saturate the network and help estimate the processing headroom available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware. Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the network’s edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNIC’s embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.

More Details

TYPE Other Report YEAR 2021

OSTI DOI

BeeGFS on Demand on StriaInitial Integration and Experiments

Aguilar, Michael J.; Regier, Phillip A.; Pedretti, Kevin P.; Curry, Matthew L.; Ogden, Jeffry B.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference Presenation YEAR 2021

OSTI DOI

CephFS experiments on stria.sandia.gov

Widener, Patrick W.; Curry, Matthew L.

This report is an institutional record of experiments conducted to explore performance of a vendor installation of CephFS on the SNL stria cluster. Comparisons between CephFS, the Lustre parallel file system, and NFS were done using the IOR and MDTEST benchmarking tools, a test program which uses the SEACAS/Trilinos IOSS library, and the checkpointing activity performed by the LAMMPS molecular dynamics simulation.

More Details

TYPE SAND Report YEAR 2020

OSTI DOI

Implementing a Common HPC Environment in a Multi-User Spack Instance

Woods, Carson W.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

System-Wide ATSE Deployments Using Spack

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Modeling Resilience Needs for Burst Buffers

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Successes and Challenges for Machine Learning at Sandia National Laboratories

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Supercomputing: Applications History and Architecture

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2019

OSTI

Erasure Coding on File Transfer Appliance for Nearline Storage With High Degree Sharding

Haddock, Walker H.; Curry, Matthew L.; Bangalore, Purushotham V.; Skjellum, Anthony S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

High performance erasure coding for very large stripe sizes

2019 Spring Simulation Conference, SpringSim 2019

Haddock, Walker; Bangalore, Purushotham V.; Curry, Matthew L.; Skjellum, Anthony

Exascale computing demands high bandwidth and low latency I/O on the computing edge. Object storage systems can provide higher bandwidth and lower latencies than tape archive. File transfer nodes present a single point of mediation through which data moving between these storage systems must pass. By increasing the performance of erasure coding, stripes can be subdivided into large numbers of shards.This paper's contribution is a prototype nearline disk object storage system based on Ceph. We show that using general purpose graphical processing units (GPGPU) for erasure coding on file transfer nodes is effective when using a large number of shards. We describe an architecture for nearline disk archive storage for use with high performance computing (HPC) and demonstrate the performance with benchmarking results. We compare the benchmark performance of our design with the Intel®Storage Acceleration Library (ISA-L) CPU based erasure coding libraries using the native Ceph erasure coding feature.

More Details

TYPE Conference Poster YEAR 2019

Scopus OSTI DOI

Evaluation of Hardware-Based MPI Acceleration on Astra

Aguilar, Michael J.; Pedretti, Kevin P.; Hammond, Simon D.; Laros, James H.; Younge, Andrew J.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

I/O Research on Astra the World?s Largest ARM Supercomputer

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2019

OSTI

Bytes are Bytes Right?

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

FY18 L2 Milestone #8759 Report: Vanguard Astra and ATSE ? an ARM-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

FY18 L2 Milestone #6360 Report: Initial Capability of an Arm-based Advanced Architecture Prototype System and Software Environment

Laros, James H.; Pedretti, Kevin P.; Hammond, Simon D.; Aguilar, Michael J.; Curry, Matthew L.; Grant, Ryan E.; Hoekstra, Robert J.; Klundt, Ruth A.; Monk, Stephen T.; Ogden, Jeffry B.; Olivier, Stephen L.; Scott, Randall D.; Ward, Harry L.; Younge, Andrew J.

The Vanguard program informally began in January 2017 with the submission of a white pa- per entitled "Sandia's Vision for a 2019 Arm Testbed" to NNSA headquarters. The program proceeded in earnest in May 2017 with an announcement by Doug Wade (Director, Office of Advanced Simulation and Computing and Institutional R&D at NNSA) that Sandia Na- tional Laboratories (Sandia) would host the first Advanced Architecture Prototype platform based on the Arm architecture. In August 2017, Sandia formed a Tri-lab team chartered to develop a robust HPC software stack for Astra to support the Vanguard program goal of demonstrating the viability of Arm in supporting ASC production computing workloads. This document describes the high-level Vanguard program goals, the Vanguard-Astra project acquisition plan and procurement up to contract placement, the initial software stack environment planned for the Vanguard-Astra platform (Astra), a description of how the communities of users will utilize the platform during the transition from the open network to the classified network, and initial performance results.

More Details

TYPE SAND Report YEAR 2018

OSTI DOI

The New Mexico Supercomputing Challenge

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI

Scientific Modeling of Storage System Reliability

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Campaign Storage: Erasure Coding With GPUs

Haddock, Walker H.; Curry, Matthew L.; Bangalore, Purushotham V.; Skjellum, Anthony S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI DOI

GPU Erasure Coding for Campaign Storage

Curry, Matthew L.; Haddock, Walker H.; Curry, Matthew L.; Bangalore, Puroshotham V.; Skjellum, Anthony S.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

GPU erasure coding for campaign storage

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Haddock, Walker; Curry, Matthew L.; Bangalore, Purushotham V.; Skjellum, Anthony

High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buffer technology increasing the bandwidth and reducing the latency for I/O between the compute and storage systems, the back-end storage bandwidth and latency requirements can be reduced, especially underneath an adequately sized modern parallel file system. Cloud computing has led to the development of large, low-cost storage solutions where design has focused on high capacity, availability, and low energy consumption at lowest cost. Cloud computing storage systems leverage duplicates and erasure coding technology to provide high availability at much lower cost than traditional HPC storage systems. Leveraging certain cloud storage infrastructure and concepts in HPC would be valuable economically in terms of cost-effective performance for certain storage tiers. To enable the use of cloud storage technologies for HPC we study the architecture for interfacing cloud storage between the HPC parallel file systems and the archive storage. In this paper, we report our comparison of two erasure coding implementations for the Ceph file system. We compare measurements of various degrees of sharding that are relevant for HPC applications. We show that the Gibraltar GPU Erasure coding library outperforms a CPU implementation of an erasure coding plugin for the Ceph object storage system, opening the potential for new ways to architect such storage systems based on Ceph.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Sirocco - Motivation and Overview

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Sirocco - An Overview

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

An Overview of Sirocco

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Cryogenic amplifiers for fast readout

England, Troy D.; Tracy, Lisa A.; Curry, Matthew L.; Carr, Stephen M.; Lilly, Michael L.; Carroll, Malcolm

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Motivation and Design of the Sirocco Storage System Version 1.0

Curry, Matthew L.; Ward, Harry L.; Danielson, Geoffrey C.

Sirocco is a massively parallel, high performance storage system for the exascale era. It emphasizes client-to-client coordination, low server-side coupling, and free data movement to improve resilience and performance. Its architecture is inspired by peer-to-peer and victim- cache architectures. By leveraging these ideas, Sirocco natively supports several media types, including RAM, flash, disk, and archival storage, with automatic migration between levels. Sirocco also includes storage interfaces and support that are more advanced than typical block storage. Sirocco enables clients to efficiently use key-value storage or block-based storage with the same interface. It also provides several levels of transactional data updates within a single storage command, including full ACID-compliant updates. This transaction support extends to updating several objects within a single transaction. Further support is provided for con- currency control, enabling greater performance for workloads while providing safe concurrent modification. By pioneering these and other technologies and techniques in the storage system, Sirocco is poised to fulfill a need for a massively scalable, write-optimized storage system for exascale systems. This is version 1.0 of a document reflecting the current and planned state of Sirocco. Further versions of this document will be accessible at http://www.cs.sandia.gov/Scalable_IO/ sirocco .

More Details

TYPE SAND Report YEAR 2015

OSTI DOI

Enabling Capabilities for Intergrated Application Workflows

Lofstead, Gerald F.; Curry, Matthew L.; Fabian, Nathan D.; Kordenbrock, Todd H.; Mukherjee, Shyamali M.; Oldfield, Ron A.; Sjaardema, Gregory D.; Templet, Gary J.; Ulmer, Craig D.; Widener, Patrick W.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Virtual Disks and Oblivious Storage: Why Linux for HPC Needs a New Block Layer

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Localstore/IDB

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

I/O Systems and Power in the Exascale Era

Curry, Matthew L.; Ward, Harry L.; Martinez, David J.

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Fourier-Assisted Modeling of Hard Disk Drive Access Times

Oldfield, Ron A.; Ward, Harry L.; Widener, Patrick W.; Kroeger, Thomas M.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Redundant Array of Inexpensive Interactive Disks (RAI^2D)

Curry, Matthew L.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Sirocco Status

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

Behavior-Based Simulation of Storage Devices

Ward, Harry L.; Oldfield, Ron A.; Widener, Patrick W.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference YEAR 2013

OSTI

Sirocco: A File System for Exascale

Curry, Matthew L.; Ward, Harry L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2013

OSTI

RAID: Motivation and Implementation

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Demonstration of a Legacy Application's Path to Exascale - ASC L2 Milestone 4467

Barrett, Brian B.; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael J.; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron A.; Pedretti, Kevin P.; Rodrigues, Arun; Barrett, Richard F.; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan D.; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Report of experiments and evidence for ASC L2 milestone 4467 : demonstration of a legacy application's path to exascale

Barrett, Brian B.; Kelly, Suzanne M.; Klundt, Ruth A.; Laros, James H.; Leung, Vitus J.; Levenhagen, Michael J.; Lofstead, Gerald F.; Moreland, Kenneth D.; Oldfield, Ron A.; Pedretti, Kevin P.; Rodrigues, Arun; Barrett, Richard F.; Ward, Harry L.; Vandyke, John P.; Vaughan, Courtenay T.; Wheeler, Kyle B.; Brandt, James M.; Brightwell, Ronald B.; Curry, Matthew L.; Fabian, Nathan D.; Ferreira, Kurt; Gentile, Ann C.; Hemmert, Karl S.

This report documents thirteen of Sandia's contributions to the Computational Systems and Software Environment (CSSE) within the Advanced Simulation and Computing (ASC) program between fiscal years 2009 and 2012. It describes their impact on ASC applications. Most contributions are implemented in lower software levels allowing for application improvement without source code changes. Improvements are identified in such areas as reduced run time, characterizing power usage, and Input/Output (I/O). Other experiments are more forward looking, demonstrating potential bottlenecks using mini-application versions of the legacy codes and simulating their network activity on Exascale-class hardware. The purpose of this report is to prove that the team has completed milestone 4467-Demonstration of a Legacy Application's Path to Exascale. Cielo is expected to be the last capability system on which existing ASC codes can run without significant modifications. This assertion will be tested to determine where the breaking point is for an existing highly scalable application. The goal is to stretch the performance boundaries of the application by applying recent CSSE RD in areas such as resilience, power, I/O, visualization services, SMARTMAP, lightweight LWKs, virtualization, simulation, and feedback loops. Dedicated system time reservations and/or CCC allocations will be used to quantify the impact of system-level changes to extend the life and performance of the ASC code base. Finally, a simulation of anticipated exascale-class hardware will be performed using SST to supplement the calculations. Determine where the breaking point is for an existing highly scalable application: Chapter 15 presented the CSSE work that sought to identify the breaking point in two ASC legacy applications-Charon and CTH. Their mini-app versions were also employed to complete the task. There is no single breaking point as more than one issue was found with the two codes. The results were that applications can expect to encounter performance issues related to the computing environment, system software, and algorithms. Careful profiling of runtime performance will be needed to identify the source of an issue, in strong combination with knowledge of system software and application source code.

More Details

TYPE SAND Report YEAR 2012

OSTI DOI

Valuing and Managing Data Based on Embodied Energy

Lofstead, Gerald F.; Oldfield, Ron A.; Curry, Matthew L.; Laros, James H.

Abstract not provided.

More Details

TYPE Presentation YEAR 2012

OSTI

Power Use of Disk Subsystems in Supercomputers

Ward, Harry L.; Martinez, David J.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Addressing Scalable I/O Challenges for Exascale

Oldfield, Ron A.; Ferreira, Kurt; Ward, Harry L.; Curry, Matthew L.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Gibraltar RAID - 2011 R&D 100 Awards Entry Form

Curry, Matthew L.; Ward, Harry L.; Brightwell, Ronald B.

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

A GPU-Based Storage System

Curry, Matthew L.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

A highly reliable RAID system based on GPUs

Curry, Matthew L.

While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance.

More Details

TYPE Conference YEAR 2010

OSTI

45 Results

45 Results