Publications
Resilient data staging through MxN distributed transactions
Lofstead, Gerald F.; Oldfield, Ron A.
Scientific computing-driven discoveries are frequently driven from workflows that use persistent storage as a staging area for data between operations. With the bad and progressively worse bandwidth vs. data size issues as we continue towards exascale, eliminating persistent storage through techniques like data staging will both enable these workflows to continue online, but also enable more interactive workflows reducing the time to scientific discoveries. Data staging has shown to be an effective way for applications running on high-end computing platforms to offload expensive I/O operations and to manage the tremendous amounts of data they produce. This data staging approach, however, lacks the ACID style guarantees traditional straight-to-disk methods provide. Distributed transactions are a proven way to add ACID properties to data movements, however distributed transactions follow 1xN data movement semantics, where our highly parallel HPC environments employ MxN data movement semantics. In this paper we present a novel protocol that extends distributed transaction terminology to include MxN semantics which allows our data staging areas to benefit from ACID properties. We show that with our protocol we can provide resilient data staging with a limited performance penalty over current data staging implementations.