Publications

Results 1–25 of 75
Skip to search filters

Sensitivity and Uncertainty Workflow of Full System SIERRA Models Supporting High Consequence Applications

Orient, George E.; Clay, Robert L.; Friedman-Hill, Ernest J.; Pebay, Philippe P.; Ridgway, Elliott M.

Credibility of end-to-end CompSim (Computational Simulation) models and their agile execution requires an expressive framework to describe, communicate and execute complex computational tool chains representing the model. All stakeholders from system engineering and customers through model developers and V&V partners need views and functionalities of the workflow representing the model in a manner that is natural to their discipline. In the milestone and in this report we define workflow as a network of computation simulation activities executed autonomously on a distributed set of computational platforms. The FY19 ASC L2 Milestone (6802) for the Integrated Workflow (IWF) project was designed to integrate and improve existing capabilities or develop new functionalities to provide a wide range of stakeholders a coherent and intuitive platform capable of defining and executing CompSim modeling from analysis workflow definition to complex ensemble calculations. The main goal of the milestone was to advance the integrated workflow capabilities to support the weapon system analysts with a production deployment in FY20. Ensemble calculations supporting program decisions include sensitivity analysis, optimization and uncertainty quantification. The goal of the L2 milestone aligned with the ultimate goal of the IWF project is to foster cultural and technical shift toward and integrated CompSim capability based on automated workflows. Specific deliverables were defined in five broad categories: 1) Infrastructure, including development of distributed-computing workflow capability, 2) integration of Dakota (Sandia's sensitivity, optimization and UQ engine) with SAW (Sandia Analysis Workbench), 3) ARG (Automatic Report Generator introspecting analysis artifacts and generating human-readable extensible and archivable reports), 4) Libraries and Repositories aiding capability reuse, and 5) Exemplars to support training, capturing best practices and stress testing of the platform. A set of exemplars was defined to represent typical weapon system qualification CompSim projects. Analyzing the required capabilities and using the findings to plan implementation of required capabilities ensured optimal allocation of development resources focused on production deployment after the L2 is completed. It was recognized early that the end-to-end modeling applications pose a considerable number of diverse risks, and a formal risk tracking process was implemented. The project leveraged products, capabilities and development tasks of IWF partners. SAW, Dakota, Cubit, Sierra, Slycat, and NGA (NexGen Analytics, a small business) contributed to the integrated platform developed during this milestone effort. New products delivered include: a) NGW (Next Generation Workflow) for robust workflow definition and execution, b) Dakota wizards, editor and results visualization, and c) the automatic report generator ARG. User engagement was initiated early in the development process eliciting concrete requirements and actionable feedback to assure that the integrated CompSim capability will have high user acceptance and impact. The current integrated capabilities have been demonstrated and are continually being tested by a set of exemplars ranging from training scenarios to computationally demanding uncertainty analyses. The integrated workflow platform has been deployed on both SRN (Sandia Restricted Network) and SCN (Sandia Classified Network). Computational platforms where the system has been demonstrated span from Windows (Creo the CAD platform chosen by Sandia) to Trinity HPC (Sierra and CTH solvers). Follow up work will focus on deployment at SNL and other sites in the nuclear enterprise (LLNL, KCNSC), training and consulting support to democratize the analysis agility, process health and knowledge management benefits the NGW platform provides. ACKNOWLEDGEMENTS The IWF team would like to acknowledge the consistent support from the ASC sponsors: Scott Hutchinson, Walt Witkowski, Ken Alvin, Tom Klitsner, Jeremy Templeton, Erik Strack, and Amanda Dodd. Without their support this integrated effort would not have been possible. We would also like to thank the milestone review panel for their insightful feedback and guidance throughout the year: Martin Heinstein, Patty Hough, Jay Dike, Dan Laney (LLNL), and Jay Billings (ORNL). And of course, without the hard work of the IWF team none of this would have happened.

More Details

ASC CSSE Level 2 Milestone #6362: Resilient Asynchronous Many Task Programming Model

Teranishi, Keita T.; Kolla, Hemanth K.; Slattengren, Nicole S.; Whitlock, Matthew J.; Mayo, Jackson M.; Clay, Robert L.; Paul, Sri R.; Hayashi, Akihiro H.; Sarkar, Vivek S.

This report is an outcome of the ASC CSSE Level 2 Milestone 6362: Analysis of Re- silient Asynchronous Many-Task (AMT) Programming Model. It comprises a summary and in-depth analysis of resilience schemes adapted to the AMT programming model. Herein, performance trade-offs of a resilient-AMT prograrnming model are assessed through two ap- proaches: (1) an analytical model realized by discrete event simulations and (2) empirical evaluation of benchmark programs representing regular and irregular workloads of explicit partial differential equation solvers. As part of this effort, an AMT execution simulator and a prototype resilient-AMT programming framework have been developed. The former permits us to hypothesize the performance behavior of a resilient-AMT model, and has undergone a verification and validation (V&V) process. The latter allows empirical evaluation of the perfor- mance of resilience schemes under emulated program failures and enabled the aforementioned V&V process. The outcome indicates that (1) resilience techniques implemented within an AMT framework allow efficient and scalable recovery under frequent failures, that (2) the abstraction of task and data instances in the AMT programming model enables readily us- able Application Program Interfaces (APIs) for resilience, and that (3) this abstraction enables predicting the performance of resilient-AMT applications with a simple simulation infrastruc- ture. This outcome will provide guidance for the design of the AMT programming model and runtime systems, user-level resilience support, and application development for ASC's next generation platforms (NGPs).

More Details

Instructions for the Installation and Testing on a Windows System of the Sandia Automatic Report Generator

PERRINEL, MERIADEG G.; Pebay, Philippe P.; Clay, Robert L.

This report is a sequel to [PC18], where we provided the detailed installation and testing instructions of Sandia's currently-being-developed Automatic Report Genera- tor (ARG), for both Linux and macOS target platforms. In the current report, we extend these instructions to the case of Windows systems.

More Details
Results 1–25 of 75
Results 1–25 of 75