Publications

Publications / SAND Report

Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner

Rizzi, Francesco N.; Morris Wright, Karla V.; Sargsyan, Khachik S.; Mycek, Paul M.; Safta, Cosmin S.; Le Maitre, Olivier L.; Knio, Omar K.; Debusschere, Bert D.

We discuss algorithm-based resilience to silent data corruption (SDC) in a task- based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDC. The imple- mentation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to [?] 51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corrup- tion of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamics voltage/frequency scaling, and its interplay with fault-rates, and application overhead. [?] Sandia National Laboratories, Livermore, CA ( fnrizzi@sandia.gov ). + Sandia National Laboratories, Livermore, CA ( knmorri@sandia.gov ). ++ Sandia National Laboratories, Livermore, CA ( ksargsy@sandia.gov ). SS Duke University, Durham, NC ( paul.mycek@duke.edu ). P Sandia National Laboratories, Livermore, CA ( csafta@sandia.gov ). k Laboratoire d'Informatique pour la M'ecanique et les Sciences de l'Ing'enieur, Orsay, France ( olm@limsi.fr ). [?][?] Duke University, Durham, NC ( omar.knio@duke.edu ). ++ Sandia National Laboratories, Livermore, CA ( bjdebus@sandia.gov ).