Optimization-based property-preserving solution recovery for fault-tolerant scalar transport
Proceedings of the 6th European Conference on Computational Mechanics: Solids, Structures and Coupled Problems, ECCM 2018 and 7th European Conference on Computational Fluid Dynamics, ECFD 2018
As the mean time between failures on the future high-performance computing platforms is expected to decrease to just a few minutes, the development of “smart”, property-preserving checkpointing schemes becomes imperative to avoid dramatic decreases in application utilization. In this paper we formulate a generic optimization-based approach for fault-tolerant computations, which separates property preservation from the compression and recovery stages of the checkpointing processes. We then specialize the approach to obtain a fault recovery procedure for a model scalar transport equation, which preserves local solution bounds and total mass. Numerical examples showing solution recovery from a corrupted application state for three different failure modes illustrate the potential of the approach.