Publications Search

Live feed Sandia CAPVIZ HPC cluster performance analysis & visualization demonstration

Allan, Benjamin A.; Schmitz, Mark E.; Walsh, Edward J.; Aguilar, Michael J.; Brandt, James M.; Gentile, Ann C.; Ogden, Jeffry B.; Monk, Stephen T.; Noe, John P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Measuring minimum switch port metric retrieval time and impact for multi-layer infiniband fabrics

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Aguilar, Michael J.; Allan, Benjamin A.; Polevitzky, Sergei I.

In this work, we seek to gain an understanding of the InfiniBand network processing limitations that might exist in gathering performance metric information from InfiniBand switches using our new LDMS ibfabric sampler. The limitations studied consist of delays in gathering InfiniBand metric information from a specific switch device due to the switch's processor response delays or RDMA contention for network bandwidth.

More Details

TYPE Conference Poster YEAR 2017

Scopus OSTI DOI

Measuring Minimum Switch Port Metric Retrieval Time and Impact for Multi-Layer Infiniband Fabrics

Allan, Benjamin A.; Aguilar, Michael J.; Allan, Benjamin A.; Allan, Benjamin A.; Polevitzky, Sergei P.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II

DeConinck, Adam D.; Nam, Hai A.; Mortin, Dave M.; Bonnie, Amanda B.; Lueninghoener, Cory L.; Brandt, James M.; Gentile, Ann C.; Pedretti, Kevin P.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon D.; Allan, Benjamin A.; Davis, Michael C.; Repik, Jason

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II (Paper)

DeConinck, Adam D.; Nam, Hai A.; Morton, David P.; Bonnie, Amanda B.; Lueninghoener, Cory L.; Brandt, James M.; Gentile, Ann C.; Pedretti, Kevin P.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon D.; Allan, Benjamin A.; Davis, Mike D.; Repik, Jason

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems

Parallel Computing

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia L.; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh R.; Stevenson, Joel O.

A detailed understanding of HPC applications’ resource needs and their complex interactions with each other and HPC platform resources are critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

More Details

TYPE Journal Article YEAR 2016

Scopus OSTI DOI

Understanding HPC System and Application Behaviors

De La Cruz, Andrew F.; Lefantzi, Sophia L.; Allan, Benjamin A.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Lightweight Distributed Metric Service: Production analytics overview for LDMS v2

Allan, Benjamin A.; Lefantzi, Sophia L.; Walsh, Edward J.; Ogden, Jeffry B.; Gauntt, Nathan E.

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI