Publications Search

We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The imple- mentation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to [?] 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corrup- tion of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead. [?] Sandia National Laboratories, Livermore, CA ( fnrizzi@sandia.gov ). + Sandia National Laboratories, Livermore, CA ( knmorri@sandia.gov ). ++ Sandia National Laboratories, Livermore, CA ( ksargsy@sandia.gov ). SS Duke University, Durham, NC ( paul.mycek@duke.edu ). P Sandia National Laboratories, Livermore, CA ( csafta@sandia.gov ). k Laboratoire d'Informatique pour la M'ecanique et les Sciences de l'Ing'enieur, Orsay, France ( olm@limsi.fr ). [?][?] Duke University, Durham, NC ( omar.knio@duke.edu ). ++ Sandia National Laboratories, Livermore, CA ( bjdebus@sandia.gov ).

More Details

TYPE SAND Report YEAR 2016

OSTI DOI

Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner

Rizzi, Francesco N.; Morris Wright, Karla V.; Sargsyan, Khachik S.; Safta, Cosmin S.; Mycek, Paul M.; Le Maitre, Olivier L.; Knio, Omar K.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Uncertainty Quantification in LES Computations of Turbulent Multiphase Combustion in a Scramjet Engine

Najm, H.N.; Debusschere, Bert D.; Safta, Cosmin S.; Sargsyan, Khachik S.; Oefelein, Joseph C.; Lacaze, Guilhem M.; Eldred, Michael S.; Knio, Omar K.; Scovazzi, G.S.; Marzouk, Y.M.; Ghanem, R.G.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Implementation of UQ Workflows with the C++/Python UQTk Toolkit

Safta, Cosmin S.; Chowdhary, Kamaljit S.; Sargsyan, Khachik S.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Quantifying Performance of a Resilient Elliptic PDE Solver on Uncertain Architectures using SST/macro

Rizzi, Francesco N.; Morris Wright, Karla V.; Sargsyan, Khachik S.; Mycek, Paul M.; Safta, Cosmin S.; Le Maitre, Olivier L.; Knio, Omar K.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

A Soft and Hard Faults Resilient Solver for 2D Elliptic PDEs via Server-Client Implementation

Rizzi, Francesco N.; Morris Wright, Karla V.; Sargsyan, Khachik S.; Mycek, Paul M.; Safta, Cosmin S.; Le Maitre, Olivier L.; Knio, Omar K.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Calibration and Comparison of Climate Models: Accounting for Structural and Discretization Error [Poster]

Debusschere, Bert D.; Najm, H.N.; Sargsyan, Khachik S.; Chowdhary, Kamaljit S.; Lucas, Don L.; Bulaevskaya, Vera B.; Qian, Yun Q.; Ghan, Steve G.; Rosa, Daniele R.; Collins, Bill C.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Calibration and Comparison of Climate Models: Accounting for Structural and Discretization Error

Debusschere, Bert D.; Najm, H.N.; Sargsyan, Khachik S.; Chowdhary, Kamaljit S.; Lucas, Don L.; Bulaevskaya, Vera B.; Qian, Yun Q.; Ghan, Steve G.; Rosa, Daniele R.; Collins, William P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

UQ in Molecular Dynamics Simulations: Forward and Inverse Problem

Rizzi, Francesco N.; Knio, Omar K.; Jones, Reese E.; Adalsteisson, Helgi A.; Najm, H.N.; Sargsyan, Khachik S.; Salloum, Maher S.; Safta, Cosmin S.; Debusschere, Bert D.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Sparse Polynomial Chaos Surrogate for ACME Land Model via Iterative Bayesian Compressive Sensing

Sargsyan, Khachik S.; Safta, Cosmin S.; Najm, H.N.; Debusschere, Bert D.; Ricciuto, Dan R.; Thornton, Peter T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Partial differential equations preconditioner resilient to soft and hard faults

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Rizzi, Francesco N.; Morris Wright, Karla V.; Sargsyan, Khachik S.; Mycek, Paul; Safta, Cosmin S.; Le Maitre, Olivier; Knio, Omar; Debusschere, Bert D.

We present a domain-decomposition-based pre-conditioner for the solution of partial differential equations (PDEs) that is resilient to both soft and hard faults. The algorithm is based on the following steps: first, the computational domain is split into overlapping subdomains, second, the target PDE is solved on each subdomain for sampled values of the local current boundary conditions, third, the subdomain solution samples are collected and fed into a regression step to build maps between the subdomains' boundary conditions, finally, the intersection of these maps yields the updated state at the subdomain boundaries. This reformulation allows us to recast the problem as a set of independent tasks. The implementation relies on an asynchronous server-client framework, where one or more reliable servers hold the data, while the clients ask for tasks and execute them. This framework provides resiliency to hard faults such that if a client crashes, it stops asking for work, and the servers simply distribute the work among all the other clients alive. Erroneous subdomain solves (e.g. due to soft faults) appear as corrupted data, which is either rejected if that causes a task to fail, or is seamlessly filtered out during the regression stage through a suitable noise model. Three different types of faults are modeled: hard faults modeling nodes (or clients) crashing, soft faults occurring during the communication of the tasks between server and clients, and soft faults occurring during task execution. We demonstrate the resiliency of the approach for a 2D elliptic PDE, and explore the effect of the faults at various failure rates.

More Details

TYPE Conference Poster YEAR 2015

Scopus OSTI DOI

Calibration and Forward Uncertainty Propagation for Large-eddy Simulations of Engineering Flows

Templeton, Jeremy A.; Blaylock, Myra L.; Domino, Stefan P.; Hewson, John C.; Kumar, Pritvi R.; Ling, Julia L.; Najm, H.N.; Ruiz, Anthony R.; Safta, Cosmin S.; Sargsyan, Khachik S.; Stewart, Alessia S.; Wagner, Gregory L.

The objective of this work is to investigate the efficacy of using calibration strategies from Uncertainty Quantification (UQ) to determine model coefficients for LES. As the target methods are for engineering LES, uncertainty from numerical aspects of the model must also be quantified. 15 The ultimate goal of this research thread is to generate a cost versus accuracy curve for LES such that the cost could be minimized given an accuracy prescribed by an engineering need. Realization of this goal would enable LES to serve as a predictive simulation tool within the engineering design process.

More Details

TYPE SAND Report YEAR 2015

OSTI DOI