Improved validation for models of complex systems has been a primary focus over the past year for the Resilience in Complex Systems Research Challenge. This document describes a set of research directions that are the result of distilling those ideas into three categories of research -- epistemic uncertainty, strong tests, and value of information. The content of this document can be used to transmit valuable information to future research activities, update the Resilience in Complex Systems Research Challenge's roadmap, inform the upcoming FY18 Laboratory Directed Research and Development (LDRD) call and research proposals, and facilitate collaborations between Sandia and external organizations. The recommended research directions can provide topics for collaborative research, development of proposals, workshops, and other opportunities.
This report contains the written footprint of a Sandia-hosted workshop held in Albuquerque, New Mexico, June 22-23, 2016 on “Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation and Uncertainty Quantification,” as well as of pre-work that fed into the workshop. The workshop’s intent was to explore and begin articulating research opportunities at the intersection between two important Sandia communities: the complex systems (CS) modeling community, and the verification, validation and uncertainty quantification (VVUQ) community The overarching research opportunity (and challenge) that we ultimately hope to address is: how can we quantify the credibility of knowledge gained from complex systems models, knowledge that is often incomplete and interim, but will nonetheless be used, sometimes in real-time, by decision makers?
Computational science and engineering application programs are typically large, complex, and dynamic, and are often constrained by distribution limitations. As a means of making tractable rapid explorations of scientific and engineering application programs in the context of new, emerging, and future computing architectures, a suite of "miniapps" has been created to serve as proxies for full scale applications. Each miniapp is designed to represent a key performance characteristic that does or is expected to significantly impact the runtime performance of an application program. In this paper we introduce a methodology for assessing the ability of these miniapps to effectively represent these performance issues. We applied this methodology to three miniapps, examining the linkage between them and an application they are intended to represent. Herein we evaluate the fidelity of that linkage. This work represents the initial steps required to begin to answer the question, "Under what conditions does a miniapp represent a key performance characteristic in a full app?"
The Predictive Capability Maturity Model (PCMM) is an expert elicitation tool designed to characterize and communicate completeness of the approaches used for computational model definition, verification, validation, and uncertainty quantification associated for an intended application. The primary application of this tool at Sandia National Laboratories (SNL) has been for physics-based computational simulations in support of nuclear weapons applications. The two main goals of a PCMM evaluation are 1) the communication of computational simulation capability, accurately and transparently, and 2) the development of input for effective planning. As a result of the increasing importance of computational simulation to SNLs mission, the PCMM has evolved through multiple generations with the goal to provide more clarity, rigor, and completeness in its application. This report describes the approach used to develop the fourth generation of the PCMM.
There has been a concerted effort since 2007 to establish a dashboard of metrics for the Science, Technology, and Engineering (ST&E) work at Sandia National Laboratories. These metrics are to provide a self assessment mechanism for the ST&E Strategic Management Unit (SMU) to complement external expert review and advice and various internal self assessment processes. The data and analysis will help ST&E Managers plan, implement, and track strategies and work in order to support the critical success factors of nurturing core science and enabling laboratory missions. The purpose of this SAND report is to provide a guide for those who want to understand the ST&E SMU metrics process. This report provides an overview of why the ST&E SMU wants a dashboard of metrics, some background on metrics for ST&E programs from existing literature and past Sandia metrics efforts, a summary of work completed to date, specifics on the portfolio of metrics that have been chosen and the implementation process that has been followed, and plans for the coming year to improve the ST&E SMU metrics process.
Sandia National Laboratories is investing in projects that aim to develop computational modeling and simulation applications that explore human cognitive and social phenomena. While some of these modeling and simulation projects are explicitly research oriented, others are intended to support or provide insight for people involved in high consequence decision-making. This raises the issue of how to evaluate computational modeling and simulation applications in both research and applied settings where human behavior is the focus of the model: when is a simulation 'good enough' for the goals its designers want to achieve? In this report, we discuss two years' worth of review and assessment of the ASC program's approach to computational model verification and validation, uncertainty quantification, and decision making. We present a framework that extends the principles of the ASC approach into the area of computational social and cognitive modeling and simulation. In doing so, we argue that the potential for evaluation is a function of how the modeling and simulation software will be used in a particular setting. In making this argument, we move from strict, engineering and physics oriented approaches to V&V to a broader project of model evaluation, which asserts that the systematic, rigorous, and transparent accumulation of evidence about a model's performance under conditions of uncertainty is a reasonable and necessary goal for model evaluation, regardless of discipline. How to achieve the accumulation of evidence in areas outside physics and engineering is a significant research challenge, but one that requires addressing as modeling and simulation tools move out of research laboratories and into the hands of decision makers. This report provides an assessment of our thinking on ASC Verification and Validation, and argues for further extending V&V research in the physical and engineering sciences toward a broader program of model evaluation in situations of high consequence decision-making.
This paper presents the conceptual framework that is being used to define quantification of margins and uncertainties (QMU) for application in the nuclear weapons (NW) work conducted at Sandia National Laboratories. The conceptual framework addresses the margins and uncertainties throughout the NW life cycle and includes the definition of terms related to QMU and to figures of merit. Potential applications of QMU consist of analyses based on physical data and on modeling and simulation. Appendix A provides general guidelines for addressing cases in which significant and relevant physical data are available for QMU analysis. Appendix B gives the specific guidance that was used to conduct QMU analyses in cycle 12 of the annual assessment process. Appendix C offers general guidelines for addressing cases in which appropriate models are available for use in QMU analysis. Appendix D contains an example that highlights the consequences of different treatments of uncertainty in model-based QMU analyses.
The Predictive Capability Maturity Model (PCMM) is a new model that can be used to assess the level of maturity of computational modeling and simulation (M&S) efforts. The development of the model is based on both the authors experience and their analysis of similar investigations in the past. The perspective taken in this report is one of judging the usefulness of a predictive capability that relies on the numerical solution to partial differential equations to better inform and improve decision making. The review of past investigations, such as the Software Engineering Institute's Capability Maturity Model Integration and the National Aeronautics and Space Administration and Department of Defense Technology Readiness Levels, indicates that a more restricted, more interpretable method is needed to assess the maturity of an M&S effort. The PCMM addresses six contributing elements to M&S: (1) representation and geometric fidelity, (2) physics and material model fidelity, (3) code verification, (4) solution verification, (5) model validation, and (6) uncertainty quantification and sensitivity analysis. For each of these elements, attributes are identified that characterize four increasing levels of maturity. Importantly, the PCMM is a structured method for assessing the maturity of an M&S effort that is directed toward an engineering application of interest. The PCMM does not assess whether the M&S effort, the accuracy of the predictions, or the performance of the engineering system satisfies or does not satisfy specified application requirements.
The 9/30/2007 ASC Level 2 Post-Processing V&V Milestone (Milestone 2360) contains functionality required by the user community for certain verification and validation tasks. These capabilities include loading of edge and face data on an Exodus mesh, run-time computation of an exact solution to a verification problem, delivery of results data from the server to the client, computation of an integral-based error metric, simultaneous loading of simulation and test data, and comparison of that data using visual and quantitative methods. The capabilities were tested extensively by performing a typical ALEGRA HEDP verification task. In addition, a number of stretch criteria were met including completion of a verification task on a 13 million element mesh.
Since 1998, the Department of Energy/NNSA National Laboratories have invested millions in strategies for assessing the credibility of computational science and engineering (CSE) models used in high consequence decision making. The answer? There is no answer. There's a process--and a lot of politics. The importance of model evaluation (verification, validation, uncertainty quantification, and assessment) increases in direct proportion to the significance of the model as input to a decision. Other fields, including computational social science, can learn from the experience of the national laboratories. Some implications for evaluating 'low cognition agents'. Epistemology considers the question, How do we know what we [think we] know? What makes Western science special in producing reliable, predictive knowledge about the world? V&V takes epistemology out of the realm of thought and puts it into practice. What is the role of modeling and simulation in the production of reliable, credible scientific knowledge about the world? What steps, investments, practices do I pursue to convince myself that the model I have developed is producing credible knowledge?
Verification and validation (V&V) are the primary means to assess the accuracy and reliability of computational simulations. V&V methods and procedures have fundamentally improved the credibility of simulations in several high-consequence fields, such as nuclear reactor safety, underground nuclear waste storage, and nuclear weapon safety. Although the terminology is not uniform across engineering disciplines, code verification deals with assessing the reliability of the software coding, and solution verification deals with assessing the numerical accuracy of the solution to a computational model. Validation addresses the physics modeling accuracy of a computational simulation by comparing the computational results with experimental data. Code verification benchmarks and validation benchmarks have been constructed for a number of years in every field of computational simulation. However, no comprehensive guidelines have been proposed for the construction and use of V&V benchmarks. For example, the field of nuclear reactor safety has not focused on code verification benchmarks, but it has placed great emphasis on developing validation benchmarks. Many of these validation benchmarks are closely related to the operations of actual reactors at near-safety-critical conditions, as opposed to being more fundamental-physics benchmarks. This paper presents recommendations for the effective design and use of code verification benchmarks based on manufactured solutions, classical analytical solutions, and highly accurate numerical solutions. In addition, this paper presents recommendations for the design and use of validation benchmarks, highlighting the careful design of building-block experiments, the estimation of experimental measurement uncertainty for both inputs and outputs to the code, validation metrics, and the role of model calibration in validation. It is argued that the understanding of predictive capability of a computational model is built on the level of achievement in V&V activities, how closely related the V&V benchmarks are to the actual application of interest, and the quantification of uncertainties related to the application of interest.
This project focused on research and algorithmic development in optimization under uncertainty (OUU) problems driven by earth penetrator (EP) designs. While taking into account uncertainty, we addressed three challenges in current simulation-based engineering design and analysis processes. The first challenge required leveraging small local samples, already constructed by optimization algorithms, to build effective surrogate models. We used Gaussian Process (GP) models to construct these surrogates. We developed two OUU algorithms using 'local' GPs (OUU-LGP) and one OUU algorithm using 'global' GPs (OUU-GGP) that appear competitive or better than current methods. The second challenge was to develop a methodical design process based on multi-resolution, multi-fidelity models. We developed a Multi-Fidelity Bayesian Auto-regressive process (MF-BAP). The third challenge involved the development of tools that are computational feasible and accessible. We created MATLAB{reg_sign} and initial DAKOTA implementations of our algorithms.
This report describes key ideas underlying the application of Quantification of Margins and Uncertainties (QMU) to nuclear weapons stockpile lifecycle decisions at Sandia National Laboratories. While QMU is a broad process and methodology for generating critical technical information to be used in stockpile management, this paper emphasizes one component, which is information produced by computational modeling and simulation. In particular, we discuss the key principles of developing QMU information in the form of Best Estimate Plus Uncertainty, the need to separate aleatory and epistemic uncertainty in QMU, and the risk-informed decision making that is best suited for decisive application of QMU. The paper is written at a high level, but provides a systematic bibliography of useful papers for the interested reader to deepen their understanding of these ideas.
This report presents a initial validation strategy for specific SNL pulsed power program applications of the ALEGRA-HEDP radiation-magnetohydrodynamics computer code. The strategy is written to be (1) broadened and deepened with future evolution of particular specifications given in this version; (2) broadly applicable to computational capabilities other than ALEGRA-HEDP directed at the same pulsed power applications. The content and applicability of the document are highly constrained by the R&D thrust of the SNL pulsed power program. This means that the strategy has significant gaps, indicative of the flexibility required to respond to an ongoing experimental program that is heavily engaged in phenomena discovery.
This report summarizes the results of an effort to establish a framework for assigning and communicating technology readiness levels (TRLs) for the modeling and simulation (ModSim) capabilities at Sandia National Laboratories. This effort was undertaken as a special assignment for the Weapon Simulation and Computing (WSC) program office led by Art Hale, and lasted from January to September 2006. This report summarizes the results, conclusions, and recommendations, and is intended to help guide the program office in their decisions about the future direction of this work. The work was broken out into several distinct phases, starting with establishing the scope and definition of the assignment. These are characterized in a set of key assertions provided in the body of this report. Fundamentally, the assignment involved establishing an intellectual framework for TRL assignments to Sandia's modeling and simulation capabilities, including the development and testing of a process to conduct the assignments. To that end, we proposed a methodology for both assigning and understanding the TRLs, and outlined some of the restrictions that need to be placed on this process and the expected use of the result. One of the first assumptions we overturned was the notion of a ''static'' TRL--rather we concluded that problem context was essential in any TRL assignment, and that leads to dynamic results (i.e., a ModSim tool's readiness level depends on how it is used, and by whom). While we leveraged the classic TRL results from NASA, DoD, and Sandia's NW program, we came up with a substantially revised version of the TRL definitions, maintaining consistency with the classic level definitions and the Predictive Capability Maturity Model (PCMM) approach. In fact, we substantially leveraged the foundation the PCMM team provided, and augmented that as needed. Given the modeling and simulation TRL definitions and our proposed assignment methodology, we conducted four ''field trials'' to examine how this would work in practice. The results varied substantially, but did indicate that establishing the capability dependencies and making the TRL assignments was manageable and not particularly time consuming. The key differences arose in perceptions of how this information might be used, and what value it would have (opinions ranged from negative to positive value). The use cases and field trial results are included in this report. Taken together, the results suggest that we can make reasonably reliable TRL assignments, but that using those without the context of the information that led to those results (i.e., examining the measures suggested by the PCMM table, and extended for ModSim TRL purposes) produces an oversimplified result--that is, you cannot really boil things down to just a scalar value without losing critical information.
This paper provides an overview of several approaches to formulating and solving optimization under uncertainty (OUU) engineering design problems. In addition, the topic of high-performance computing and OUU is addressed, with a discussion of the coarse- and fine-grained parallel computing opportunities in the various OUU problem formulations. The OUU approaches covered here are: sampling-based OUU, surrogate model-based OUU, analytic reliability-based OUU (also known as reliability-based design optimization), polynomial chaos-based OUU, and stochastic perturbation-based OUU.
This report is a white paper summarizing the literature and different approaches to the problem of calibrating computer model parameters in the face of model uncertainty. Model calibration is often formulated as finding the parameters that minimize the squared difference between the model-computed data (the predicted data) and the actual experimental data. This approach does not allow for explicit treatment of uncertainty or error in the model itself: the model is considered the %22true%22 deterministic representation of reality. While this approach does have utility, it is far from an accurate mathematical treatment of the true model calibration problem in which both the computed data and experimental data have error bars. This year, we examined methods to perform calibration accounting for the error in both the computer model and the data, as well as improving our understanding of its meaning for model predictability. We call this approach Calibration under Uncertainty (CUU). This talk presents our current thinking on CUU. We outline some current approaches in the literature, and discuss the Bayesian approach to CUU in detail.
The views of state of art in verification and validation (V & V) in computational physics are discussed. These views are described in the framework in which predictive capability relies on V & V, as well as other factors that affect predictive capability. Some of the research topics addressed are development of improved procedures for the use of the phenomena identification and ranking table (PIRT) for prioritizing V & V activities, and the method of manufactured solutions for code verification. It also addressed development and use of hierarchical validation diagrams, and the construction and use of validation metrics incorporating statistical measures.