We report a projection-based reduced order model (pROM) methodology has been developed for transient heat transfer problems involving coupled conduction and enclosure radiation. The approach was demonstrated on two test problems of varying complexity. The reduced order models demonstrated substantial speedups (up to 185×) relative to the full order model with good accuracy (less than 3% L∞ error). An attractive feature of pROMs is that there is a natural error indicator for the ROM solution: the final residual norm at each time-step of the converged ROM solution. Using example test cases, we discuss how to interpret this error indicator to assess the accuracy of the ROM solution. The approach shows promise for many-query applications, such as uncertainty quantification and optimization. The reduced computational cost of the ROM relative to the full-order model (FOM) can enable the analysis of larger and more complex systems as well as the exploration of larger parameter spaces.
High-fidelity hypersonic aerodynamic simulations require extensive computational resources, hindering their usage in hypersonic vehicle design and uncertainty quantification. Projectionbased reduced-order models (ROMs) are a computationally cheaper alternative to full-order simulations that can provide major speedup with marginal loss of accuracy when solving manyquery problems such as design optimization and uncertainty propagation. However, ROMs can present robustness and convergence issues, especially when trained over large ranges of input parameters and/or with few training samples. This paper presents the application of several different residual minimization-based ROMs to hypersonic flows around flight vehicles using less training data than in previous work. The ROM demonstrations are accompanied by a comparison to fully data-driven approaches including kriging and radial basis function interpolation. Results are presented for three test cases including one three-dimensional flight vehicle. We show that registration-based ROMs trained on grid-tailored solutions can compute quantities of interest more accurately than data driven approaches for a given sparse training set. We also find that the classic ℓ2 state error metric is not particularly useful when comparing different model reduction techniques on sparse training data sets.
This work aims to advance computational methods for projection-based reduced-order models (ROMs) of linear time-invariant (LTI) dynamical systems. For such systems, current practice relies on ROM formulations expressing the state as a rank-1 tensor (i.e., a vector), leading to computational kernels that are memory bandwidth bound and, therefore, ill-suited for scalable performance on modern architectures. This weakness can be particularly limiting when tackling many-query studies, where one needs to run a large number of simulations. This work introduces a reformulation, called rank-2 Galerkin, of the Galerkin ROM for LTI dynamical systems which converts the nature of the ROM problem from memory bandwidth to compute bound. We present the details of the formulation and its implementation, and demonstrate its utility through numerical experiments using, as a test case, the simulation of elastic seismic shear waves in an axisymmetric domain. We quantify and analyze performance and scaling results for varying numbers of threads and problem sizes. Finally, we present an end-to-end demonstration of using the rank-2 Galerkin ROM for a Monte Carlo sampling study. We show that the rank-2 Galerkin ROM is one order of magnitude more efficient than the rank-1 Galerkin ROM (the current practice) and about 970 times more efficient than the full-order model, while maintaining accuracy in both the mean and statistics of the field.
Thermal protection system designers rely heavily on computational simulation tools for design optimization and uncertainty quantification. Because high-fidelity analysis tools are computationally expensive, analysts primarily use low-fidelity or surrogate models instead. In this work, we explore an alternative approach wherein projection-based reduced-order models (ROMs) are used to approximate the computationally infeasible high-fidelity model. ROMs are preferable to alternative approximation approaches for high-consequence applications due to the presence of rigorous error bounds. This work presents the first application of ROMs to ablation systems. In particular, we present results for Galerkin and least-squares Petrov-Galerkin ROMs of 1D and 2D ablation system models.
High-speed aerospace engineering applications rely heavily on computational fluid dynamics (CFD) models for design and analysis. This reliance on CFD models necessitates performing accurate and reliable uncertainty quantification (UQ) of the CFD models, which can be very expensive for hypersonic flows. Additionally, UQ approaches are many-query problems requiring many runs with a wide range of input parameters. One way to enable computationally expensive models to be used in such many-query problems is to employ projection-based reduced-order models (ROMs) in lieu of the (high-fidelity) full-order model (FOM). In particular, the least-squares Petrov–Galerkin (LSPG) ROM (equipped with hyper-reduction) has demonstrated the ability to significantly reduce simulation costs while retaining high levels of accuracy on a range of problems, including subsonic CFD applications. This allows LSPG ROM simulations to replace the FOM simulations in UQ studies, making UQ tractable even for large-scale CFD models. This work presents the first application of LSPG to a hypersonic CFD application, the Hypersonic International Flight Research Experimentation 1 (HIFiRE-1) in a three-dimensional, turbulent Mach 7.1 flow. This paper shows the ability of the ROM to significantly reduce computational costs while maintaining high levels of accuracy in computed quantities of interest.
This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or 'taskbased', programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascalelevel computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a nextgeneration plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.
High-speed aerospace engineering applications rely heavily on computational fluid dynamics (CFD) models for design and analysis due to the expense and difficulty of flight tests and experiments. This reliance on CFD models necessitates performing accurate and reliable uncertainty quantification (UQ) of the CFD models. However, it is very computationally expensive to run CFD for hypersonic flows due to the fine grid resolution required to capture the strong shocks and large gradients that are typically present. Furthermore, UQ approaches are “many-query” problems requiring many runs with a wide range of input parameters. One way to enable computationally expensive models to be used in such many-query problems is to employ projection-based reduced-order models (ROMs) in lieu of the (high-fidelity) full-order model. In particular, the least-squares Petrov–Galerkin (LSPG) ROM (equipped with hyper-reduction) has demonstrated the ability to significantly reduce simulation costs while retaining high levels of accuracy on a range of problems including subsonic CFD applications. This allows computationally inexpensive LSPG ROM simulations to replace the full-order model simulations in UQ studies, which makes this many-query task tractable, even for large-scale CFD models. This work presents the first application of LSPG to a hypersonic CFD application. In particular, we present results for LSPG ROMs of the HIFiRE-1 in a three-dimensional, turbulent Mach 7.1 flow, showcasing the ability of the ROM to significantly reduce computational costs while maintaining high levels of accuracy in computed quantities of interest.
Truly predictive numerical simulations can only be obtained by performing Uncertainty Quantification. However, many realistic engineering applications require extremely complex and computationally expensive high-fidelity numerical simulations for their accurate performance characterization. Very often the combination of complex physical models and extreme operative conditions can easily lead to hundreds of uncertain parameters that need to be propagated through high-fidelity codes. Under these circumstances, a single fidelity uncertainty quantification approach, i.e. a workflow that only uses high-fidelity simulations, is unfeasible due to its prohibitive overall computational cost. To overcome this difficulty, in recent years multifidelity strategies emerged and gained popularity. Their core idea is to combine simulations with varying levels of fidelity/accuracy in order to obtain estimators or surrogates that can yield the same accuracy of their single fidelity counterparts at a much lower computational cost. This goal is usually accomplished by defining a priori a sequence of discretization levels or physical modeling assumptions that can be used to decrease the complexity of a numerical model realization and thus its computational cost. Less attention has been dedicated to low-fidelity models that can be built directly from a small number of available high-fidelity simulations. In this work we focus our attention on reduced order models (ROMs). Our main goal in this work is to investigate the combination of multifidelity uncertainty quantification and ROMs in order to evaluate the possibility to obtain an efficient framework for propagating uncertainties through expensive numerical codes. We focus our attention on sampling-based multifidelity approaches, like the multifidelity control variate, and we consider several scenarios for a numerical test problem, namely the Kuramoto-Sivashinsky equation, for which the efficiency of the multifidelity-ROM estimator is compared to the standard (single-fidelity) Monte Carlo approach.
High-speed aerospace engineering applications rely heavily on computational fluid dynamics (CFD) models for design and analysis due to the expense and difficulty of flight tests and experiments. This reliance on CFD models necessitates performing accurate and reliable uncertainty quantification (UQ) of the CFD models. However, it is very computationally expensive to run CFD for hypersonic flows due to the fine grid resolution required to capture the strong shocks and large gradients that are typically present. Additionally, UQ approaches are “many-query” problems requiring many runs with a wide range of input parameters. One way to enable computationally expensive models to be used in such many-query problems is to employ projection-based reduced-order models (ROMs) in lieu of the (high-fidelity) full-order model. In particular, the least-squares Petrov–Galerkin (LSPG) ROM (equipped with hyper-reduction) has demonstrated the ability to significantly reduce simulation costs while retaining high levels of accuracy on a range of problems including subsonic CFD applications [1, 2]. This allows computationally inexpensive LSPG ROM simulations to replace the full-order model simulations in UQ studies, which makes this many-query task tractable, even for large-scale CFD models. This work presents the first application of LSPG to a hypersonic CFD application. In particular, we present results for LSPG ROMs of the HIFiRE-1 in a three-dimensional, turbulent Mach 7.1 flow, showcasing the ability of the ROM to significantly reduce computational costs while maintaining high levels of accuracy in computed quantities of interest.
This project has developed models of variability of performance to enable robust design and certification. Material variability originating from microstructure has significant effects on component behavior and creates uncertainty in material response. The outcomes of this project are uncertainty quantification (UQ) enabled analysis of material variability effects on performance and methods to evaluate the consequences of microstructural variability on material response in general. Material variability originating from heterogeneous microstructural features, such as grain and pore morphologies, has significant effects on component behavior and creates uncertainty around performance. Current engineering material models typically do not incorporate microstructural variability explicitly, rather functional forms are chosen based on intuition and parameters are selected to reflect mean behavior. Conversely, mesoscale models that capture the microstructural physics, and inherent variability, are impractical to utilize at the engineering scale. Therefore, current efforts ignore physical characteristics of systems that may be the predominant factors for quantifying system reliability. To address this gap we have developed explicit connections between models of microstructural variability and component/system performance. Our focus on variability of mechanical response due to grain and pore distributions enabled us to fully probe these influences on performance and develop a methodology to propagate input variability to output performance. This project is at the forefront of data-science and material modeling. We adapted and innovated from progressive techniques in machine learning and uncertainty quantification to develop a new, physically-based methodology to address the core issues of the Engineering Materials Reliability (EMR) research challenge in modeling constitutive response of materials with significant inherent variability and length-scales.
This is the DARMA FY19-Q1 interim report. This page intentionally left blank. This document was generated with the Automatic Report Generator (ARG). This page intentionally left blank.
PARMA (Distributed Asynchronous Resilient Models and ApH asynchronous many-task (AMT) rmogramming models and hardware idiosyncrasies, 2) improve application programmer interface (API) plication Ico-desiga activities into meaningful requirements for characterization and definition, accelerating the development of pARMAI APT is a rranslation layer runtime systems Am' 11 between an application-facing . The application-facing user-level iting the generic language constructs of C++ and adding parallel programs. Though the implementation of the provide the front end semantics, it is nonetheless fully embedded in the C++ language and leverages a widely supported front end fiack end in C++, inher- that facilitate expressing distributed asynchronous uses C++ constructs unfamiliar to many programmers to subset of C++14 functionality (gcc >= 4.9, clang >= 3.5, icc > = 16). The rranslation layer leverages C++ to map the user's code onto the fiack encI runtime APT. The fiack end APT is a set of abstract classes and function signatures that iuntime systenr developers must implement in accordance with the specification require- ments in order to interface with application code written to the must link to a iuntime systenr that implements the abstract mentations will be external, drawing upon existing provided in the pARMAI code distribution. IDARMAI fiack end templatO front end. Executable 1DARMA applications runtime APT. It is intended that these imple- technologies. However, a reference implementation will be The front end rranslation layer, and iback end APT are detailed herein. We also include a list of application requirements driving the specification (along with a list of the applications contributing to the requirements to date), a brief history of changes between previous versions of the specification, and summary of the planned changes in up- coming versions of the specification. Appendices walk the user through a more detailed set of examples of applications written in the PARMA front encI APII and provide additional technical details for those the interested reader.
We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The imple- mentation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to [?] 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corrup- tion of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead. [?] Sandia National Laboratories, Livermore, CA ( fnrizzi@sandia.gov ). + Sandia National Laboratories, Livermore, CA ( knmorri@sandia.gov ). ++ Sandia National Laboratories, Livermore, CA ( ksargsy@sandia.gov ). SS Duke University, Durham, NC ( paul.mycek@duke.edu ). P Sandia National Laboratories, Livermore, CA ( csafta@sandia.gov ). k Laboratoire d'Informatique pour la M'ecanique et les Sciences de l'Ing'enieur, Orsay, France ( olm@limsi.fr ). [?][?] Duke University, Durham, NC ( omar.knio@duke.edu ). ++ Sandia National Laboratories, Livermore, CA ( bjdebus@sandia.gov ).
We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the subdomain solution samples are collected and fed into a regression step to build maps between the subdomains' boundary conditions, finally, the intersection of these maps yields the updated state at the subdomain boundaries. This reformulation allows us to recast the problem as a set of independent tasks. The implementation relies on an asynchronous server-client framework, where one or more reliable servers hold the data, while the clients ask for tasks and execute them. This framework provides resiliency to hard faults such that if a client crashes, it stops asking for work, and the servers simply distribute the work among all the other clients alive. Erroneous subdomain solves (e.g. due to soft faults) appear as corrupted data, which is either rejected if that causes a task to fail, or is seamlessly filtered out during the regression stage through a suitable noise model. Three different types of faults are modeled: hard faults modeling nodes (or clients) crashing, soft faults occurring during the communication of the tasks between server and clients, and soft faults occurring during task execution. We demonstrate the resiliency of the approach for a 2D elliptic PDE, and explore the effect of the faults at various failure rates.
The move towards extreme-scale computing platforms challenges scientific simula- tions in many ways. Given the recent tendencies in computer architecture development, one needs to reformulate legacy codes in order to cope with large amounts of commu- nication, system faults and requirements of low-memory usage per core. In this work, we develop a novel framework for solving partial differential equa- tions (PDEs) via domain decomposition that reformulates the solution as a state-of- knowledge with a probabilistic interpretation. Such reformulation allows resiliency with respect to potential faults without having to apply fault detection, avoids unnecessary communication and is generally well-positioned for rigorous uncertainty quantification studies that target improvements of predictive fidelity of scientific models. We demon- strate our algorithm for one-dimensional PDE examples where artificial faults have been implemented as bit-flips in the binary representation of subdomain solutions. *Sandia National Laboratories, 7011 East Ave, MS 9051, Livermore, CA 94550 (ksargsy@sandia.gov). t Sandia National Laboratories, Livermore, CA (fnrizzi@sandia.gov). IDuke University, Durham, NC (paul .mycek@duke . edu). Sandia National Laboratories, Livermore, CA (csaft a@sandia.gov). i llSandia National Laboratories, Livermore, CA (knmorri@sandia.gov). II Sandia National Laboratories, Livermore, CA (hnnajm@sandia.gov). **Laboratoire d'Informatique pour la Mecanique et les Sciences de l'Ingenieur, Orsay, France (olm@limsi . f r). ttDuke University, Durham, NC (omar . knio@duke . edu). It Sandia National Laboratories, Livermore, CA (bjdebus@sandia.gov).
The future of extreme-scale computing is expected to magnify the influence of soft faults as a source of inaccuracy or failure in solutions obtained from distributed parallel computations. The development of resilient computational tools represents an essential recourse for understanding the best methods for absorbing the impacts of soft faults without sacrificing solution accuracy. The Rexsss (Resilient Extreme Scale Scientific Simulations) project pursues the development of fault resilient algorithms for solving partial differential equations (PDEs) on distributed systems. Performance analyses of current algorithm implementations assist in the identification of runtime inefficiencies.