Publications

Results 976–1000 of 9,998
Skip to search filters

Deep Conservation: A Latent-Dynamics Model for Exact Satisfaction of Physical Conservation Laws

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Lee, Kookjin L.; Carlberg, Kevin T.

This work proposes an approach for latent-dynamics learning that exactly enforces physical conservation laws. The method comprises two steps. First, the method computes a low-dimensional embedding of the high-dimensional dynamical-system state using deep convolutional autoencoders. This defines a low-dimensional nonlinear manifold on which the state is subsequently enforced to evolve. Second, the method defines a latent-dynamics model that associates with the solution to a constrained optimization problem. Here, the objective function is defined as the sum of squares of conservation-law violations over control volumes within a finite-volume discretization of the problem; nonlinear equality constraints explicitly enforce conservation over prescribed subdomains of the problem. Under modest conditions, the resulting dynamics model guarantees that the time-evolution of the latent state exactly satisfies conservation laws over the prescribed subdomains.

More Details

Evolving Spiking Circuit Motifs Using Weight Agnostic Neural Networks

35th AAAI Conference on Artificial Intelligence, AAAI 2021

Anwar, Abrar

Neural architecture search (NAS) has emerged as an algorithmic method of developing neural network architectures. Weight Agnostic Neural Networks (WANNs) are an evolutionary-based NAS approach. Fundamentally, WANNs find network structures that are relatively insensitive to shifts in weight values and are typically much smaller than an equivalent performance dense network. Here, we extend the WANN framework to search for spiking circuits and in doing so investigate whether these circuit motifs can also yield task performance that is weight agnostic. We analyze properties such as the complexity of the solution, as well as performance. Our results successfully show the performance of spiking WANNs on several exemplar tasks.

More Details

Towards Predictive Plasma Science and Engineering through Revolutionary Multi-Scale Algorithms and Models (Final Report)

Laity, George R.; Robinson, Allen C.; Cuneo, M.E.; Alam, Mary K.; Beckwith, Kristian B.; Bennett, Nichelle L.; Bettencourt, Matthew T.; Bond, Stephen D.; Cochrane, Kyle C.; Criscenti, Louise C.; Cyr, Eric C.; De Zetter, Karen J.; Drake, Richard R.; Evstatiev, Evstati G.; Fierro, Andrew S.; Gardiner, Thomas A.; Glines, Forrest W.; Goeke, Ronald S.; Hamlin, Nathaniel D.; Hooper, Russell H.; Koski, Jason K.; Lane, James M.; Larson, Steven R.; Leung, Kevin L.; McGregor, Duncan A.; Miller, Philip R.; Miller, Sean M.; Ossareh, Susan J.; Phillips, Edward G.; Simpson, Sean S.; Sirajuddin, David S.; Smith, Thomas M.; Swan, Matthew S.; Thompson, Aidan P.; Tranchida, Julien G.; Bortz-Johnson, Asa J.; Welch, Dale R.; Russell, Alex M.; Watson, Eric D.; Rose, David V.; McBride, Ryan D.

This report describes the high-level accomplishments from the Plasma Science and Engineering Grand Challenge LDRD at Sandia National Laboratories. The Laboratory has a need to demonstrate predictive capabilities to model plasma phenomena in order to rapidly accelerate engineering development in several mission areas. The purpose of this Grand Challenge LDRD was to advance the fundamental models, methods, and algorithms along with supporting electrode science foundation to enable a revolutionary shift towards predictive plasma engineering design principles. This project integrated the SNL knowledge base in computer science, plasma physics, materials science, applied mathematics, and relevant application engineering to establish new cross-laboratory collaborations on these topics. As an initial exemplar, this project focused efforts on improving multi-scale modeling capabilities that are utilized to predict the electrical power delivery on large-scale pulsed power accelerators. Specifically, this LDRD was structured into three primary research thrusts that, when integrated, enable complex simulations of these devices: (1) the exploration of multi-scale models describing the desorption of contaminants from pulsed power electrodes, (2) the development of improved algorithms and code technologies to treat the multi-physics phenomena required to predict device performance, and (3) the creation of a rigorous verification and validation infrastructure to evaluate the codes and models across a range of challenge problems. These components were integrated into initial demonstrations of the largest simulations of multi-level vacuum power flow completed to-date, executed on the leading HPC computing machines available in the NNSA complex today. These preliminary studies indicate relevant pulsed power engineering design simulations can now be completed in (of order) several days, a significant improvement over pre-LDRD levels of performance.

More Details

Multi-scale physics-based modeling of particle-impact erosion of CMCS

AIAA Scitech 2021 Forum

Newsome, David; Waxman, Rae; Giles, Stephen; Silling, Stewart A.

Aeroengines ingest foreign object debris such as sand, which eventually erode components through repeated impacts. Due to the wide feature space, modeling and simulations are needed to rapidly assess the erosion behavior of materials such as composites. Peridynamic simulations were performed to analyze erosion of SiC/SiC composite due to sand impacts, which gives direct insight into the impact erosion mechanism and amounts. The erosion data was strongly correlated to impact velocity and angle, providing predictive equations.

More Details

FROSch Preconditioners for Land Ice Simulations of Greenland and Antarctica

Heinlein, Alexander H.; Perego, Mauro P.; Rajamanickam, Sivasankaran R.

Numerical simulations of Greenland and Antarctic ice sheets involve the solution of large-scale highly nonlinear systems of equations on complex shallow geometries. This work is concerned with the construction of Schwarz preconditioners for the solution of the associated tangent problems, which are challenging for solvers mainly because of the strong anisotropy of the meshes and wildly changing boundary conditions that can lead to poorly constrained problems on large portions of the domain. Here, two-level GDSW (Generalized Dryja–Smith–Widlund) type Schwarz preconditioners are applied to different land ice problems, i.e., a velocity problem, a temperature problem, as well as the coupling of the former two problems. We employ the MPI-parallel implementation of multi-level Schwarz preconditioners provided by the package FROSch (Fast and Robust Schwarz)from the Trilinos library. The strength of the proposed preconditioner is that it yields out-of-the-box scalable and robust preconditioners for the single physics problems. To our knowledge, this is the first time two-level Schwarz preconditioners are applied to the ice sheet problem and a scalable preconditioner has been used for the coupled problem. The pre-conditioner for the coupled problem differs from previous monolithic GDSW preconditioners in the sense that decoupled extension operators are used to compute the values in the interior of the sub-domains. Several approaches for improving the performance, such as reuse strategies and shared memory OpenMP parallelization, are explored as well. In our numerical study we target both uniform meshes of varying resolution for the Antarctic ice sheet as well as non uniform meshes for the Greenland ice sheet are considered. We present several weak and strong scaling studies confirming the robustness of the approach and the parallel scalability of the FROSch implementation. Among the highlights of the numerical results are a weak scaling study for up to 32 K processor cores (8 K MPI-ranks and 4 OpenMP threads) and 566 M degrees of freedom for the velocity problem as well as a strong scaling study for up to 4 K processor cores (and MPI-ranks) and 68 M degrees of freedom for the coupled problem.

More Details

Integrated fluid and materials modeling of environmental barrier coatings

AIAA Scitech 2021 Forum

Newsome, David; Waxman, Rae; Hoffie, Andreas; Silling, Stewart A.

Environmental Barrier Coatings (EBC) protect ceramic matrix composites from exposure to high temperature moisture present in turbine operation through their dense top coats. However, moisture is able to diffuse and oxidize the Si bond coat to form the Thermally Grown Oxide (TGO), a layer of SiO2 where the incorporation of O causes swelling and stress. At sufficient TGO-based swelling, the EBC will fail due to increased damage such as delamination. A multiscale simulation framework has been developed to link operating conditions of a high-performance turbine to the failure modes of the EBC. Computational fluid dynamics (CFD) simulations of the E3 turbine were performed and compared to prior literature data to demonstrate the fidelity of the Loci/CHEM software to determine the flow conditions on the turbine blade surface. Boundary condition data of pressure and heat flux were then determined with the CFD simulations, providing the temperature at the bond coat. Peridynamics was used to model the microscale TGO growth. A swelling model that links moisture concentration to strain at the TGO due to the volume increase from oxidation was demonstrated, coupling moisture transport to localized strain and directly observing TGO growth and the corresponding damage. This framework is generalized and can be adapted to a range of EBC microstructures and operating conditions.

More Details

Spiking Neural Streaming Binary Arithmetic

Proceedings - 2021 International Conference on Rebooting Computing, ICRC 2021

Aimone, James B.; Hill, Aaron J.; Severa, William M.; Vineyard, Craig M.

Boolean functions and binary arithmetic operations are central to standard computing paradigms. Accordingly, many advances in computing have focused upon how to make these operations more efficient as well as exploring what they can compute. To best leverage the advantages of novel computing paradigms it is important to consider what unique computing approaches they offer. However, for any special-purpose co-processor, Boolean functions and binary arithmetic operations are useful for, among other things, avoiding unnecessary I/O on-and-off the co-processor by pre- and post-processing data on-device. This is especially true for spiking neuromorphic architectures where these basic operations are not fundamental low-level operations. Instead, these functions require specific implementation. Here we discuss the implications of an advantageous streaming binary encoding method as well as a handful of circuits designed to exactly compute elementary Boolean and binary operations.

More Details

A Bayesian MACHINE LEARNING FRAMEWORK FOR SELECTION OF THE STRAIN GRADIENT PLASTICITY MULTISCALE MODEL

ASME International Mechanical Engineering Congress and Exposition, Proceedings (IMECE)

Tan, Jingye; Maupin, Kathryn A.; Shao, Shuai; Faghihi, Danial

A class of sequential multiscale models investigated in this study consists of discrete dislocation dynamics (DDD) simulations and continuum strain gradient plasticity (SGP) models to simulate the size effect in plastic deformation of metallic micropillars. The high-fidelity DDD explicitly simulates the microstructural (dislocation) interactions. These simulations account for the effect of dislocation densities and their spatial distributions on plastic deformation. The continuum SGP captures the size-dependent plasticity in micropillars using two length parameters. The main challenge in predictive DDD-SGP multiscale modeling is selecting the proper constitutive relations for the SGP model, which is necessitated by the uncertainty in computational prediction due to DDD's microstructural randomness. This contribution addresses these challenges using a Bayesian learning and model selection framework. A family of SGP models with different fidelities and complexities is constructed using various constitutive relation assumptions. The parameters of the SGP models are then learned from a set of training data furnished by the DDD simulations of micropillars. Bayesian learning allows the assessment of the credibility of plastic deformation prediction by characterizing the microstructural variability and the uncertainty in training data. Additionally, the family of the possible SGP models is subjected to a Bayesian model selection to pick the model that adequately explains the DDD training data. The framework proposed in this study enables learning the physics-based multiscale model from uncertain observational data and determining the optimal computational model for predicting complex physical phenomena, i.e., size effect in plastic deformation of micropillars.

More Details

Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Olivier, Stephen L.; Ellingwood, Nathan D.; Berry, Jonathan W.; Dunlavy, Daniel D.

Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.

More Details

Using Computation Effectively for Scalable Poisson Tensor Factorization: Comparing Methods beyond Computational Efficiency

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Myers, Jeremy M.; Dunlavy, Daniel D.

Poisson Tensor Factorization (PTF) is an important data analysis method for analyzing patterns and relationships in multiway count data. In this work, we consider several algorithms for computing a low-rank PTF of tensors with sparse count data values via maximum likelihood estimation. Such an approach reduces to solving a nonlinear, non-convex optimization problem, which can leverage considerable parallel computation due to the structure of the problem. However, since the maximum likelihood estimator corresponds to the global minimizer of this optimization problem, it is important to consider how effective methods are at both leveraging this inherent parallelism as well as computing a good approximation to the global minimizer. In this work we present comparisons of multiple methods for PTF that illustrate the tradeoffs in computational efficiency and accurately computing the maximum likelihood estimator. We present results using synthetic and real-world data tensors to demonstrate some of the challenges when choosing a method for a given tensor.

More Details

Low-Communication Asynchronous Distributed Generalized Canonical Polyadic Tensor Decomposition

2021 IEEE High Performance Extreme Computing Conference, HPEC 2021

Lewis, Cannada L.; Phipps, Eric T.

In this work, we show that reduced communication algorithms for distributed stochastic gradient descent improve the time per epoch and strong scaling for the Generalized Canonical Polyadic (GCP) tensor decomposition, but with a cost, achieving convergence becomes more difficult. The implementation, based on MPI, shows that while one-sided algorithms offer a path to asynchronous execution, the performance benefits of optimized allreduce are difficult to best.

More Details

Gate Set Tomography

Quantum

Nielsen, Erik N.; Gamble, John K.; Rudinger, Kenneth M.; Scholten, Travis; Young, Kevin; Blume-Kohout, Robin J.

Gate set tomography (GST) is a protocol for detailed, predictive characterization of logic operations (gates) on quantum computing processors. Early versions of GST emerged around 2012-13, and since then it has been refined, demonstrated, and used in a large number of experiments. This paper presents the foundations of GST in comprehensive detail. The most important feature of GST, compared to older state and process tomography protocols, is that it is calibration-free. GST does not rely on pre-calibrated state preparations and measurements. Instead, it characterizes all the operations in a gate set simultaneously and self-consistently, relative to each other. Long sequence GST can estimate gates with very high precision and efficiency, achieving Heisenberg scaling in regimes of practical interest. In this paper, we cover GST’s intellectual history, the techniques and experiments used to achieve its intended purpose, data analysis, gauge freedom and fixing, error bars, and the interpretation of gauge-fixed estimates of gate sets. Our focus is fundamental mathematical aspects of GST, rather than implementation details, but we touch on some of the foundational algorithmic tricks used in the pyGSTi implementation.

More Details

Rendezvous algorithms for large-scale modeling and simulation

Journal of Parallel and Distributed Computing

Plimpton, Steven J.; Knight, Christopher

Rendezvous algorithms encode a communication pattern that is useful when processors sending data do not know who the receiving processors should be, or vice versa. The idea is to define an intermediate decomposition where datums from different sending processors can ”rendezvous” to perform a computation, in a manner that both the senders and eventual receivers of the results can identify the appropriate rendezvous processor. Originally designed for interpolating between overlaid grids with independent parallel decompositions (Plimpton et al., 2004), we have recently found rendezvous algorithms useful for a variety of operations in particle- or grid-based simulation codes when running large problems on large numbers of processors. In particular, we show they can perform well when a load-balanced intermediate decomposition is randomized and not spatial, requiring all-to-all communication to move data between processors. In this case rendezvous algorithms leverage the large bisection communication bandwidths which parallel machines provide. We describe how rendezvous algorithms work in a scientific computing context and give specific examples for molecular dynamics and Direct Simulation Monte Carlo codes which result in dramatic performance improvements versus simpler algorithms which do not scale as well. We explain how a generic rendezvous algorithm can be implemented, and also point out similarities with the MapReduce paradigm popularized by Google and Hadoop.

More Details

Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Aksar, Burak; Zhang, Yijia; Ates, Emre; Schwaller, Benjamin S.; Aaziz, Omar R.; Leung, Vitus J.; Brandt, James M.; Egele, Manuel; Coskun, Ayse K.

Performance variation diagnosis in High-Performance Computing (HPC) systems is a challenging problem due to the size and complexity of the systems. Application performance variation leads to premature termination of jobs, decreased energy efficiency, or wasted computing resources. Manual root-cause analysis of performance variation based on system telemetry has become an increasingly time-intensive process as it relies on human experts and the size of telemetry data has grown. Recent methods use supervised machine learning models to automatically diagnose previously encountered performance anomalies in compute nodes. However, supervised machine learning models require large labeled data sets for training. This labeled data requirement is restrictive for many real-world application domains, including HPC systems, because collecting labeled data is challenging and time-consuming, especially considering anomalies that sparsely occur. This paper proposes a novel semi-supervised framework that diagnoses previously encountered performance anomalies in HPC systems using a limited number of labeled data points, which is more suitable for production system deployment. Our framework first learns performance anomalies’ characteristics by using historical telemetry data in an unsupervised fashion. In the following process, we leverage supervised classifiers to identify anomaly types. While most semi-supervised approaches do not typically use anomalous samples, our framework takes advantage of a few labeled anomalous samples to classify anomaly types. We evaluate our framework on a production HPC system and on a testbed HPC cluster. We show that our proposed framework achieves 60% F1-score on average, outperforming state-of-the-art supervised methods by 11%, and maintains an average 0.06% anomaly miss rate.

More Details

Error estimates for the optimal control of a parabolic fractional pde

SIAM Journal on Numerical Analysis

Glusa, Christian A.; OTAROLA, ENRIQUE

We consider the integral definition of the fractional Laplacian and analyze a linearquadratic optimal control problem for the so-called fractional heat equation; control constraints are also considered. We derive existence and uniqueness results, first order optimality conditions, and regularity estimates for the optimal variables. To discretize the state equation we propose a fully discrete scheme that relies on an implicit finite difference discretization in time combined with a piecewise linear finite element discretization in space. We derive stability results and a novel L2(0, T;L2(Ω)) a priori error estimate. On the basis of the aforementioned solution technique, we propose a fully discrete scheme for our optimal control problem that discretizes the control variable with piecewise constant functions, and we derive a priori error estimates for it. We illustrate the theory with one- and two-dimensional numerical experiments.

More Details

Deep learning of parameterized equations with applications to uncertainty quantification

International Journal for Uncertainty Quantification

Qin, Tong; Chen, Zhen; Jakeman, John D.; Xiu, Dongbin

We propose a learning algorithm for discovering unknown parameterized dynamical systems by using observational data of the state variables. Our method is built upon and extends the recent work of discovering unknown dynamical systems, in particular those using a deep neural network (DNN). We propose a DNN structure, largely based upon the residual network (ResNet), to not only learn the unknown form of the governing equation but also to take into account the random effect embedded in the system, which is generated by the random parameters. Once the DNN model is successfully constructed, it is able to produce system prediction over a longer term and for arbitrary parameter values. For uncertainty quantification, it allows us to conduct uncertainty analysis by evaluating solution statistics over the parameter space.

More Details
Results 976–1000 of 9,998
Results 976–1000 of 9,998