Publications

17 Results
Skip to search filters

A Reference Architecture For EmulyticsTM Clusters

Floren, John F.; Friesen, Jerrold A.; Ulmer, Craig D.; Jones, Stephen T.

In this document we describe a reference architecture developed for EmulyticsTM clusters at Sandia National Laboratories. Taking into consideration the constraints of our Emulytics software and the requirements for integration with the larger computing facilities at Sandia, we developed a cluster platform suitable for use by Sandia's several Emulytics toolsets and also useful for more general large-scale computing tasks.

More Details

Extreme-scale viability of collective communication for resilient task scheduling and work stealing

Proceedings of the International Conference on Dependable Systems and Networks

Wilke, Jeremiah J.; Bennett, Janine C.; Kolla, Hemanth K.; Teranishi, Keita T.; Slattengren, Nicole S.; Floren, John F.

Extreme-scale computing will bring significant changes to high performance computing system architectures. In particular, the increased number of system components is creating a need for software to demonstrate 'pervasive parallelism' and resiliency. Asynchronous, many-task programming models show promise in addressing both the scalability and resiliency challenges, however, they introduce an enormously challenging distributed, resilient consistency problem. In this work, we explore the viability of resilient collective communication in task scheduling and work stealing and, through simulation with SST/macro, the performance of these collectives on speculative extreme-scale architectures.

More Details
17 Results
17 Results