The CCR enterprise is closely tied to the laboratories’ broader set of missions and strategies. We share responsibility within Sandia as stewards of important capabilities for the nation in high-strain-rate physics, scientific visualization, optimization, uncertainty quantification, scalable solvers, inverse methods, and computational materials. We also leverage our core technologies to execute projects through various partnerships, such as Cooperative Research and Development Agreements (CRADAs) and Strategic Partnerships, as well as partnerships with universities.
Advanced Tri-lab Software Environment (ATSE)
Albany is an implicit, unstructured grid, finite element code for the solution and analysis of partial differential equations. Albany is the main demonstration application of the AgileComponents software development strategy at Sandia. It is a PDE code that strives to be built almost entirely from functionality contained within reusable libraries (such as Trilinos/STK/Dakota/PUMI). Albany plays a large role in demonstrating and maturing functionality of new libraries, and also in the interfaces and interoperability between these libraries. It also serves to expose gaps in our coverage of capabilities and interface design.
In addition to the component-based code design strategy, Albany also attempts to showcase the concept of Analysis Beyond Simulation, where an application code is developed up from for a design and analysis mission. All Albany applications are born with the ability to perform sensitivity analysis, stability analysis, optimization, and uncertainty quantification, with a clean interfaces for exposing design parameters for manipulation by analysis algorithms.
Albany also attempts to be a model for software engineering tools and processes, so that new research codes can adopt the Albany infrastructure as a good starting point. This effort involves a close collaboration with the 1400 SEMS team.
The Albany code base is host to several application projects, notably:
- LCM (Laboratory for Computational Mechanics) [PI J. Ostien]: A platform for research in finite deformation mechanics, including algorithms for failure and fracture modeling, discretizations, implicit solution algorithms, advanced material models, coupling to particle-based methods, and efficient implementations on new architectures.
- QCAD (Quantum Computer Aided Design) [PI Nielsen]: A code to aid in the design of quantum dots from built in doped silicon devices. QCAD solves the coupled Schoedinger-Poisson system. When wrapped in Dakota, optimal operating conditions can be found.
- FELIX (Finite Element for Land Ice eXperiments) [PI Salinger]: This application solves variants of a nonlinear Stokes flow for simulating the evolution of Ice Sheets. In particular we are modeling the Greenland and Antarctic Ice Sheets for modeling effects of Climate change, in particularly their influence on Sea-Level Rise. Will be linked into ACME.
- Aeras [PI Spotz]: A component-based approach to atmospheric modeling, where advanced analysis algorithms and design for efficient code on new architectures are built into the code.
In addition, Albany is used as a platform for algorithmic research:
- FASTMath SciDAC project: We are developing a capability for adaptive mesh refinement within an unstructured grid application, in collaboration with Mark Shephard’s group at the SCOREC center at RPI.
- Embedded UQ: Research into embedded UQ algorithms led by Eric Phipps often uses Albany as a demonstration platform.
- Performance Portable Kernels for new architectures: Albany is serving as a research vehicle for programming finite element assembly kernels using the Trilinos/Kokkos programming model and library.
The ALEGRA application is targeted at the simulation of high strain rate, magnetohydrodynamic, electromechanic and high energy density physics phenomena for the U.S. defense and energy programs. Research and development in advanced methods, including code frameworks, large scale inline meshing, multiscale lagrangian hydrodynamics, resistive magnetohydrodynamic methods, material interface reconstruction, and code verification and validation, keeps the software on the cutting edge of high performance computing.
- artificial intelligence
- neural networks
- neuromorphic
- probabilistic
- stochastic
E3SM – Energy Exascale Earth System Model
E3SM is an unprecedented collaboration among seven National Laboratories, the National Center for Atmospheric Research, four academic institutions and one private-sector company to develop and apply the most complete leading-edge climate and Earth system models to the most challenging and demanding climate change research imperatives. It is the only major national modeling project designed to address U.S. Department of Energy (DOE) mission needs and specifically targets DOE Leadership Computing Facility resources now and in the future, because DOE researchers do not have access to other major climate computing centers. A major motivation for the E3SM project is the coming paradigm shift in computing architectures and their related programming models as capability moves into the Exascale era. DOE, through its science programs and early adoption of new computing architectures, traditionally leads many scientific communities, including climate and Earth system simulation, through these disruptive changes in computing.
ECP Supercontainers
Container computing has revolutionized how many industries and enterprises develop and deploy software and services. Recently, this model has gained traction in the High Performance Computing (HPC) community through enabling technologies including Charliecloud, Shifter, Singularity, and Docker. In this same trend, container-based computing paradigms have gained significant interest within the DOE/NNSA Exascale Computing Project (ECP). While containers provide greater software flexibility, reliability, ease of deployment, and portability for users, there are still several challenges in this area for Exascale.
The goal of the ECP Supercomputing Containers Project (called Supercontainers) is to use a multi-level approach to accelerate adoption of container technologies for Exascale, ensuring that HPC container runtimes will be scalable, interoperable, and integrated into Exascale supercomputing across DOE. The core components of the SuperContainer project focus on foundational system software research needed for ensuring containers can be deployed at scale, enhanced user and developer support for enabling ECP Application Development (AD) and Software Technology (ST) projects looking to utilize containers, validated container runtime interoperability, and both vendor and E6 facilities system software integration with containers.
The FASTMath SciDAC Institute develops and deploys scalable mathematical algorithms and software tools for reliable simulation of complex physical phenomena and collaborates with application scientists to ensure the usefulness and applicability of FASTMath technologies.
FASTMath is a collaboration between Argonne National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Massachusetts Institute of Technology, Rensselaer Polytechnic Institute, Sandia National Laboratories, Southern Methodist University, and University of Southern California. Dr. Esmond Ng, LBNL, leads the FASTMath project.
Hobbes – Extreme-Scale Operating Systems Project
Hobbes was a Sandia-led collaboration between four national laboratories and eight universities supported by the DOE Office of Science Advanced Scientific Computing Research program office. The goal of this three-year project was to deliver an operating system for future extreme-scale parallel computing platforms that will address the major technical challenges of energy efficiency, managing massive parallelism and deep memory hierarchies, and providing resilience in the presence of increasing failures. Our approach was to enable application composition through lightweight virtualization. Application composition is a critical capability that will be the foundation of the way extreme-scale systems must be used in the future. The tighter integration of modeling and simulation capability with analysis and the increasing complexity of application workflows demand more sophisticated machine usage models and new system-level services. Ensemble calculations for uncertainty quantification, large graph analytics, multi-materials and multi-physics applications are just a few examples that are driving the need for these new system software interfaces and mechanisms for managing memory, network, and computational resources. Rather than providing a single unified operating system and runtime system that supports several parallel programming models, Hobbes leveraged lightweight virtualization to provide the flexibility to construct and efficiently execute custom OS/R environments. Hobbes extended previous work on the Kitten lightweight operating system and the Palacios lightweight virtual machine monitor.
HPC Resource Allocation
HPC resource allocation consists of a pipeline of methods by which distributed-memory work is assigned to distributed-memory resources to complete that work. This pipeline spans both system and application level software. At the system level, it consists of scheduling and allocation. At the application level, broadly speaking, it consists of discretization (meshing), partitioning, and task mapping. Scheduling, given requests for resources and available resources, decides which request will be assigned resources next or when a request will be assigned resources. When a request is granted, allocation decides which specific resources will be assigned to that request. For the application, HPC resource allocation begins with the discretization and partitioning of the work into a distributed-memory model and ends with the task mapping that matches the allocated resources to the partitioned work. Additionally, network architecture and routing have a strong impact on HPC resource allocation. Each of these problems is solved independently and makes assumptions about how the other problems are solved. We have worked in all of these areas and have recently begun work to combine some of them, in particular allocation and task mapping. We have used analysis, simulation, and real system experiments in this work. Techniques specific to any particular application have not been considered in this work.
Institute for the Design of Advanced Energy Systems (IDAES)
Transforming and decarbonizing the world’s energy systems to make them environmentally sustainable while maintaining high reliability and low cost is a task that requires the very best computational and simulation capabilities to examine a complete range of technology options, ensure that the best choices are made, and to support their rapid and effective implementation.
The Institute for Design of Advanced Energy Systems (IDAES) was originated to bring the most advanced modeling and optimization capabilities to these challenges. The resulting IDAES integrated platform utilizes the most advanced computational algorithms to enable the design and optimization of complex, interacting energy and process systems from individual plant components to the entire electrical grid.
IDAES is led by the National Energy Technology Laboratory (NETL) with participants from Lawrence Berkeley National Laboratory (LBNL), Sandia National Laboratories (SNL), Carnegie-Mellon University, West Virginia University, University of Notre Dame, and Georgia Institute of Technology.
The IDAES leadership team is:
- David C. Miller, Technical Director
- Anthony Burgard, NETL PI
- Deb Agarwal, LBNL PI
- John Siirola, SNL PI
Kitten Lightweight Kernel
Kitten is a current-generation lightweight kernel (LWK) compute node operating system designed for large-scale parallel computing systems. Kitten is the latest in a long line of successful LWKs, including SUNMOS, Puma, Cougar, and Catamount. Kitten distinguishes itself from these prior LWKs by providing a Linux-compatible user environment, a more modern and extendable open-source codebase, and a virtual machine monitor capability via Palacios that allows full-featured guest operating systems to be loaded on-demand.
Modern high performance computing (HPC) nodes have diverse and heterogeneous types of cores and memory. For applications and domain-specific libraries/languages to scale, port, and perform well on these next generation architectures, their on-node algorithms must be re-engineered for thread scalability and performance portability. The Kokkos programming model and its C++ library implementation helps HPC applications and domain libraries implement intra-node thread-scalable algorithms that are performance portable across diverse manycore architectures such as multicore CPUs, Intel Xeon Phi, NVIDIA GPU, and AMD GPU.
This research, development, and deployment project advances the Kokkos programming model with new intra-node parallel algorithm abstractions, implements these abstractions in the Kokkos library, and supports applications’ and domain libraries’ effective use of Kokkos through consulting and tutorials. The project fosters numerous internal and external collaborations, especially with the ISO/C++ language standard committee to promote Kokkos abstractions into future ISO/C++ language standards. Kokkos is part of the DOE Exascale Computing Project.
MESQUITE is a linkable software library that applies a variety of node-movement algorithms to improve the quality and/or adapt a given mesh. Mesquite uses advanced smoothing and optimization to:
- Untangle meshes,
- Provide local size control,
- Improve angles, orthogonality, and skew,
- Increase minimum edge-lengths for increased time-steps,
- Improve mesh smoothness,
- Perform anisotropic smoothing,
- Improve surface meshes, adapt to surface curvature,
- Improve hybrid meshes (including pyramids & wedges),
- Smooth meshes with hanging nodes,
- Maintain quality of moving and/or deforming meshes,
- Perform ALE rezoning,
- Improve mesh quality on and near boundaries,
- Improve transitions across internal boundaries,
- Align meshes with vector fields, and
- R-adapt meshes to solutions using error estimates.
Mesquite improves surface or volume meshes which are structured, unstructured, hybrid, or non-comformal. A variety of element types are permitted. Mesquite is designed to be as efficient as possible so that large meshes can be improved.
Portals Interconnect API
Portals is an interconnect API intended to allow scalable, high-performance network communication between nodes of a large-scale parallel computing system. Portals is based on a building blocks approach that enables multiple upper-level protocols, such as MPI and SHMEM, to be used simultaneously within a process. This approach also encapsulates important semantics that can be offloaded to a network interface controller (NIC) to optimize performance-critical functionality. Previous generations of Portals have been deployed on large-scale production systems, including the Intel ASCI Red machine and Cray’s SeaStar interconnect for their XT product line. The current generation API is being used to enable advanced NIC architecture research for future extreme-scale systems.
Power API
Stitch – IO Library for highly localized simulations
IO libraries typically focus on writing the entire simulation domain for each output. For many computation classes, this is the correct choice. However, there are some cases where this approach is wasteful in time and space.
The Stitch library was developed initially for use with the SPPARKS kinetic monte carlo simulation to handle IO tasks for a welding simulation. This simulation type has a particular feature where there is computational intensity in a small part of the simulation domain with the rest being idle. Given this intensity, only writing the area that changes is far more space efficient than writing the entire simulation domain for each output. Further, the computation can be focused strictly on the area where the data will change rather than the whole domain. This can yield a reduction from 1024 to 16 processes and 1/64th the data written. These combined can lead to a reduction in the computation time with no loss in data quality. If anything, by reducing the amount written each time, more output is possible.
This approach is also applicable for finite element codes that share the same localized physics.
The code is in the final stages of copyright review and will be released on A work in progress paper was presented at PDSW-DISCS @ SC18 and a full CS conference paper is planned for H1 2019 and a follow-on materials science journal paper.
Structural Simulation Toolkit (SST)
The Structural Simulation Toolkit (SST) enables co-design of extreme-scale architectures by allowing simulation of diverse aspects of hardware and software relevant to such environments. Innovations in instruction set architecture, memory systems, the network interface, and full system network can be explored in the context of design choices for the programming model and algorithms. The package provides two novel capabilities. The first is a fully modular design that enables extensive exploration of an individual system parameter without the need for intrusive changes to the simulator. The second is a parallel simulation environment based on MPI. This provides a high level of performance and the ability to look at large systems. The framework has been successfully used to model concepts ranging from processing in memory to conventional processors connected by conventional network interfaces and running MPI.
The Vanguard project is expanding the high-performance computing ecosystem by evaluting and accelerating the development of emerging technologies in order to increase their viability for future large-scale production platforms. The goal of the project is to reduce the risk in deploying unproven technologies by identifying gaps in the hardware and software ecosystem and making focused investments to address them. The approach is to make early investments that identify the essential capabilities needed to move technologies from small-scale testbed to large-scale production use.
XPRESS – eXascale Programming Environment and System Software
The XPRESS Project was one of four major projects of the DOE Office of Science Advanced Scientific Computing Research X-stack Program initiated in September, 2012. The purpose of XPRESS was to devise an innovative system software stack to enable practical and useful exascale computing around the end of the decade with near-term contributions to efficient and scalable operation of trans-Petaflops performance systems in the next two to three years; both for DOE mission-critical applications. To this end, XPRESS directly addressed critical challenges in computing of efficiency, scalability, and programmability through introspective methods of dynamic adaptive resource management and task scheduling.
The Zoltan project focuses on parallel algorithms for parallel combinatorial scientific computing, including partitioning, load balancing, task placement, graph coloring, matrix ordering, distributed data directories, and unstructured communication plans.
The Zoltan toolkit is an open-source library of MPI-based distributed memory algorithms. It includes geometric and hypergraph partitioners, global graph coloring, distributed data directories using rendezvous algorithms, primitives to simplify data movement and unstructured communication, and interfaces to the ParMETIS, Scotch and PaToH partitioning libraries. It is written in C and can be used as a stand-alone library.
The Zoltan2 toolkit is the next-generation toolkit for multicore architectures. It includes MPI+OpenMP algorithms for geometric partitioning, architecture-aware task placement, and local matrix ordering. It is written in templated C++ and is tightly integrated with the Trilinos toolkit.