Publications

Results 26–50 of 60
Skip to search filters

Evolving the message passing programming model via a fault-tolerant, object-oriented transport layer

FTXS 2015 - Proceedings of the 2015 Workshop on Fault Tolerance for HPC at eXtreme Scale, Part of HPDC 2015

Wilke, Jeremiah J.; Kolla, Hemanth K.; Teranishi, Keita T.; Hollman, David S.; Bennett, Janine C.; Slattengren, Nicole S.

In this position paper, we argue for improved fault-tolerance of an MPI code by introducing lightweight virtualization into the MPI interface. In particular, we outline key-value store semantics for MPI send/recv calls, thereby creating a far more expressive programming model. The general message passing semantics and imperative style of MPI application codes would remain essentially unchanged. However, the additional expressiblity of the programming model 1) enables the underlying transport layer to handle faulttolerance more transparently to the application developer, and 2) provides an evolutionary code path towards more declarative asynchronous programming models. The core contribution of this paper is an initial implementation of the DHARMA transport layer that provides the new, required functionality to support the MPI key-value store model.

More Details

Lessons Learned from Porting the MiniAero Application to Charm++

Hollman, David S.; Hollman, David S.; Bennett, Janine C.; Bennett, Janine C.; Wilke, Jeremiah J.; Wilke, Jeremiah J.; Kolla, Hemanth K.; Kolla, Hemanth K.; Lin, Paul L.; Lin, Paul L.; Slattengren, Nicole S.; Slattengren, Nicole S.; Teranishi, Keita T.; Teranishi, Keita T.; franko, ken f.; franko, ken f.; Jain, Nikhil J.; Jain, Nikhil J.; Mikida, Eric M.; Mikida, Eric M.

Abstract not provided.

Extreme-scale viability of collective communication for resilient task scheduling and work stealing

Proceedings of the International Conference on Dependable Systems and Networks

Wilke, Jeremiah J.; Bennett, Janine C.; Kolla, Hemanth K.; Teranishi, Keita T.; Slattengren, Nicole S.; Floren, John F.

Extreme-scale computing will bring significant changes to high performance computing system architectures. In particular, the increased number of system components is creating a need for software to demonstrate 'pervasive parallelism' and resiliency. Asynchronous, many-task programming models show promise in addressing both the scalability and resiliency challenges, however, they introduce an enormously challenging distributed, resilient consistency problem. In this work, we explore the viability of resilient collective communication in task scheduling and work stealing and, through simulation with SST/macro, the performance of these collectives on speculative extreme-scale architectures.

More Details
Results 26–50 of 60
Results 26–50 of 60