Publications Search

5 Results

Design Installation and Operation of the Vortex ART Platform

Gauntt, Nathan E.; Davis, Kevin D.; Repik, Jason; Brandt, James M.; Gentile, Ann C.; Hammond, Simon D.

Abstract not provided.

More Details

TYPE Other Report YEAR 2019

OSTI DOI

Large-Scale System Monitoring Experiences and Recommendations

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Ahlgren, Ville; Andersson, Stefan; Brandt, James M.; Cardo, Nicholas; Chunduri, Sudheer; Enos, Jeremy; Fields, Parks; Gentile, Ann C.; Gerber, Richard; Gienger, Michael; Greenseid, Joe; Greiner, Annette; Hadri, Bilel; He, Yun; Hoppe, Dennis; Kaila, Urpo; Kelly, Kaki; Klein, Mark; Kristiansen, Alex; Leak, Steve; Mason, Mike; Pedretti, Kevin P.; Piccinali, Jean G.; Repik, Jason; Rogers, Jim; Salminen, Susanna; Showerman, Mike; Whitney, Cary; Williams, Jim

Monitoring of High Performance Computing (HPC) platforms is critical to successful operations, can provide insights into performance-impacting conditions, and can inform methodologies for improving science throughput. However, monitoring systems are not generally considered core capabilities in system requirements specifications nor in vendor development strategies. In this paper we present work performed at a number of large-scale HPC sites towards developing monitoring capabilities that fill current gaps in ease of problem identification and root cause discovery. We also present our collective views, based on the experiences presented, on needs and requirements for enabling development by vendors or users of effective sharable end-to-end monitoring capabilities.

More Details

TYPE Conference Poster YEAR 2018

Scopus OSTI DOI

Large-Scale System Monitoring Experiences and Recommendations

Ahlgren, V.A.; Andersson, S.A.; Brandt, James M.; Cardo, N.C.; Chunduri, S.C.; Enos, J.E.; Fields, P.F.; Gentile, Ann C.; Gerber, R.B.; Gienger, M.G.; Greenseid, J.G.; Greiner, A.G.; Hadri, B.H.; He, Y.H.; Hoppe, D.H.; Kaila, U.K.; Kelly, K.K.; Klein, M.K.; Kristiansen, A.K.; Leak, S.L.; Mason, M.M.; Pedretti, Kevin P.; Piccinali, J-G.P.; Repik, Jason; Rogers, J.R.; Salminen, S.S.; showerman, m.s.; Whitney, C.W.; Williams, J.W.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2018

OSTI DOI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II

DeConinck, Adam D.; Nam, Hai A.; Mortin, Dave M.; Bonnie, Amanda B.; Lueninghoener, Cory L.; Brandt, James M.; Gentile, Ann C.; Pedretti, Kevin P.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon D.; Allan, Benjamin A.; Davis, Michael C.; Repik, Jason

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II (Paper)

DeConinck, Adam D.; Nam, Hai A.; Morton, David P.; Bonnie, Amanda B.; Lueninghoener, Cory L.; Brandt, James M.; Gentile, Ann C.; Pedretti, Kevin P.; Agelastos, Anthony M.; Vaughan, Courtenay T.; Hammond, Simon D.; Allan, Benjamin A.; Davis, Mike D.; Repik, Jason

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

5 Results

Publications

Design Installation and Operation of the Vortex ART Platform

Large-Scale System Monitoring Experiences and Recommendations

Large-Scale System Monitoring Experiences and Recommendations

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II

Runtime collection and analysis of system metrics for production monitoring of Trinity Phase II (Paper)

Current Filters