Publications Search

Application of Performance Analysis Tools on SNL ASC Codes

Agelastos, Anthony M.; Pase, Douglas M.; Amspaugh, Kathleen A.; Dinge, Dennis; Haskell, Karen; Ice, Lisa; Lamb, Justin M.; Rajan, Mahesh; Shaw, Ryan; Stevenson, Joel O.; Brunini, Victor; Clausen, Jonathan; Crawford, Martin J.; Valdez, Greg D.

This milestone 1) exercised a broad set of performance profiling and analysis tools, including tools whose development has been promoted by the ASC program; 2) exercised the tools on two different SNL ASC codes, one Sierra code (Sierra/Aria, a C++ codebase) and one RAMSES code (ITS, a Fortran codebase); and 3) exercised the tools on multiple platforms, including the CTS-1 (e.g., Serrano) and ATS-1 Trinity (e.g., Mutrino) platforms. The milestone generated a plethora of strong and weak scaling, trend and profile data for multiple versions and problem cases for each of the two codes. A wealth of experience was gained with the various tools that included identification of problems, an improved understanding of feature sets, enhanced usage documentation, and insights for future tool-development. Results are provided from a large number and variety of performance analysis runs with the target codes, together with instructions for how to make use of the tools with the codes.

More Details

TYPE SAND Report YEAR 2017

DOI OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance-Applications and Benchmarks

Agelastos, Anthony M.; Rajan, Mahesh; Wichmann, N.; Lin, Paul T.; Baker, R.; Domino, Stefan P.; Draeger, E.; Anderson, S.; Balma, J.; Behling, S.; Berry, M.; Carrier, P.; Davis, M.; Mcmahon, K.; Sandness, D.; Thomas, K.; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2017

OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance Applications and Benchmarks

Agelastos, Anthony M.; Rajan, Mahesh; Wichmann, Nathan; Baker, Randy; Domino, Stefan P.; Draeger, Erik W.; Anderson, Sarah; Balma, Jacob; Behling, S.; Berry, Mike; Carrier, Pierre; Davis, Mike; Mcmahon, Kim; Sandness, D.; Thomas, Kevin; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance on Trinity Phase 2 (a Cray XC40 utilizing Intel Xeon Phi processors) with Acceptance Applications and Benchmarks

Rajan, Mahesh; Agelastos, Anthony M.; Domino, Stefan P.; Wichmann, N.; Baker, R.; Draeger, E.; Anderson, S.; Balma, J.; Behling, S.; Berry, M.; Carrier, P.; Davis, M.; Mcmahon, K.; Sandness, D.; Thomas, K.; Warren, S.; Zhu, T.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Comparisons of Application performance on TLCC2 CTS1 and Trinity

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2017

OSTI

A study of CTS1 performance

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems

Parallel Computing

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

A detailed understanding of HPC applications’ resource needs and their complex interactions with each other and HPC platform resources are critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because typical monitoring tools do not capture system resource usage information with high enough fidelity to gain sufficient insight into application performance and demands. In this paper we present both system and application profiling results based on data obtained through synchronized system wide monitoring on a production HPC cluster at Sandia National Laboratories (SNL). We demonstrate analytic and visualization techniques that we are using to characterize application and system resource usage under production conditions for better understanding of application resource needs. Our goals are to improve application performance (through understanding application-to-resource mapping and system throughput) and to ensure that future system capabilities match their intended workloads.

More Details

TYPE Journal Article YEAR 2016

DOI OSTI Scopus

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell; Glass, Micheal W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell; Glass, Micheal W.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Vaughan, Courtenay T.; Dinge, Dennis; Lin, Paul T.; Hammond, Simon; Cook, Jeanine; Trott, Christian R.; Agelastos, Anthony M.; Pase, Douglas M.; Benner, Robert E.; Rajan, Mahesh; Hoekstra, Robert J.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Trinity Application Performance Requirement and Status

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2016

OSTI

An Investigation of Compiler Vectorization on Current and Next-generation Intel Processors using Benchmarks and Sandia?s SIERRA Applications

Rajan, Mahesh; Doerfler, Douglas W.; Tupek, Michael R.; Hammond, Simon

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Trinity: Architecture and Early Experience

Hemmert, Karl S.; Rajan, Mahesh; Hoekstra, Robert J.; Dawson, Shawn; Vigil, Manuel; Grunau, Daryl; Lujan, James; Morton, David; Nam, Hai A.; Peltz Jr., Paul; Torrez, Alfred; Wright, Cornell

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks

Rajan, Mahesh; Wichman, Natahan; Nuss, Cindy; Carrier, Pierre; Olson, Ryan; Anderson, Sarah; Davis, Michael; Baker, Randal; Draeger, Erik; Domino, Stefan; Agelastos, Anthony

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance Efficiency and Effectivness of Supercomputers

Leland, Robert W.; Rajan, Mahesh; Heroux, Michael A.

Our first purpose here is to offer to a general technical and policy audience a perspective on whether the supercomputing community should focus on improving the efficiency of supercomputing systems and their use rather than on building larger and ostensibly more capable systems that are used at low efficiency. After first summarizing our content and defining some necessary terms, we give a concise answer to this question. We then set this in context by characterizing performance of current supercomputing systems on a variety of benchmark problems and actual problems drawn from workloads in the national security, industrial, and scientific context. Along the way we answer some related questions, identify some important technological trends, and offer a perspective on the significance of these trends. Our second purpose is to give a reasonably broad and transparent overview of the related issue space and thereby to better equip the reader to evaluate commentary and controversy concerning supercomputing performance. For example, questions repeatedly arise concerning the Linpack benchmark and its predictive power, so we consider this in moderate depth as an example. We also characterize benchmark and application performance for scientific and engineering use of supercomputers and offer some guidance on how to think about these. Examples here are drawn from traditional scientific computing. Other problem domains, for example, data analytics, have different performance characteristics that are better captured by different benchmark problems or applications, but the story in those domains is similar in character and leads to similar conclusions with regard to the motivating question.

More Details

TYPE SAND Report YEAR 2016

DOI OSTI

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Vaughan, Courtenay T.; Dinge, Dennis; Lin, Paul T.; Hammond, Simon; Cook, Jeanine; Trott, Christian R.; Agelastos, Anthony M.; Pase, Douglas M.; Benner, Robert E.; Rajan, Mahesh; Hoekstra, Robert J.; Pierson, Kendall H.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2016

OSTI

Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks

Rajan, Mahesh; Domino, Stefan P.; Agelastos, Anthony M.; Wichmann, Nathan; Nuss, Cindy; Carrier, Pierre; Olson, Ryan; Anderson, Sarah; Davis, Mike; Baker, Randy; Dreager, Erik

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Trinity Acceptance Tests Performance Summary

Rajan, Mahesh

Ensuring Real Applications perform well on Trinity is key to success. Four components: ASC applications, Sustained System Performance (SSP), Extra-Large MiniApplications problems, and Micro-benchmarks.

More Details

TYPE Other Report YEAR 2015

DOI OSTI

ASC Trilab L2 Codesign Milestone 2015

Trott, Christian R.; Hammond, Simon; Dinge, Dennis; Lin, Paul T.; Vaughan, Courtenay T.; Cook, Jeanine; Rajan, Mahesh; Edwards, Harold C.; Hoekstra, Robert J.

For the FY15 ASC L2 Trilab Codesign milestone Sandia National Laboratories performed two main studies. The first study investigated three topics (performance, cross-platform portability and programmer productivity) when using OpenMP directives and the RAJA and Kokkos programming models available from LLNL and SNL respectively. The focus of this first study was the LULESH mini-application developed and maintained by LLNL. In the coming sections of the report the reader will find performance comparisons (and a demonstration of portability) for a variety of mini-application implementations produced during this study with varying levels of optimization. Of note is that the implementations utilized including optimizations across a number of programming models to help ensure claims that Kokkos can provide native-class application performance are valid. The second study performed during FY15 is a performance assessment of the MiniAero mini-application developed by Sandia. This mini-application was developed by the SIERRA Thermal-Fluid team at Sandia for the purposes of learning the Kokkos programming model and so is available in only a single implementation. For this report we studied its performance and scaling on a number of machines with the intent of providing insight into potential performance issues that may be experienced when similar algorithms are deployed on the forthcoming Trinity ASC ATS platform.

More Details

TYPE SAND Report YEAR 2015

DOI OSTI

ASC L2 Trilab Codesign Milestone (Codesign at Sandia: LULESH and MiniAero)

Cook, Jeanine; Edwards, Harold C.; Dinge, Dennis; Glass, Micheal W.; Hammond, Simon; Hoekstra, Robert J.; Lin, Paul T.; Rajan, Mahesh; Trott, Christian R.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Trinity Road Show Programming Environment and Tools

Rajan, Mahesh; Foulk, James W.; Green, Jennifer; Shrader, David

Abstract not provided.

More Details

TYPE Presentation YEAR 2015

OSTI

Preperation of Codes for Trinity

Vaughan, Courtenay T.; Rajan, Mahesh; Dinge, Dennis; Dohrmann, Clark R.; Glass, Micheal W.; Franko, Kenneth J.; Pierson, Kendall H.; Tupek, Michael R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Preperation of Codes for Trinity

Vaughan, Courtenay T.; Rajan, Mahesh; Dinge, Dennis; Dohrmann, Clark R.; Glass, Micheal W.; Franko, Kenneth J.; Pierson, Kendall H.; Tupek, Michael R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Toward Rapid Understanding of Production HPC Applications and Systems

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Lefantzi, Sophia; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2015

OSTI

Trinity Benchmarks on the Intel Xeon Phi (Knights Corner)

Rajan, Mahesh; Doerfler, Douglas W.; Hammond, Simon

This report documents the early experiences with porting and performance analysis of the Tri-Lab Trinity benchmark applications on Intel Xeon Phi (Knights Corner) (KNC) processor. KNC, the second generation of the Intel Many Integrated Core (MIC) architectures, uses a large number of small P54C-x86 cores with wide vector units and is deployed as PCI bus attached process accelerators. Sandia has experimental test beds of small InifiniBand clusters and workstations to investigate the performance of the MIC architecture. On these experimental test beds the programming models that may be investigated are "offload", "symmetric" and "native". Among these program usage models our primary interest is in the so called "native" mode, because the planned Trinity system to be deployed in 2016 using the next generation MIC processor architecture called Knights Landing would be self-hosted. Trinity / NERSC-8 benchmark programs cover a variety of scientific disciplines and they were used to guide the procurement of these systems. Architectures such as the Intel MIC are well suited to study evolving processor architectures and a usage model commonly referred to as MPI + X that facilitates migration of our applications to use both coarse grain and fine grain parallelism. Our focus with the applications selected is on the efficacy of algorithms in these applications to take advantage of features like: large number of cores, wide vector units, higher-bandwidth and deeper memory sub-system. This is a first step towards understanding applications, algorithms and programming environments for Trinity and future exascale computing systems.

More Details

TYPE SAND Report YEAR 2014

DOI OSTI

CoE Meeting Tools Discussion

Rajan, Mahesh; Dinge, Dennis

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Preparation of Codes for Trinity

Vaughan, Courtenay T.; Rajan, Mahesh; Dinge, Dennis; Dohrmann, Clark R.; Franko, Kenneth; Glass, Micheal W.; Pierson, Kendall H.; Tupek, Michael R.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

SIERRA Solid Mechanics Trinity CoE Meeting: SIERRA/SM Profiling

Tupek, Michael R.; Pierson, Kendall H.; Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2014

OSTI

Trinity Benchmarks on Xeon Phi (Knights Corner)

Rajan, Mahesh; Doerfler, Douglas W.; Hammond, Simon; Trott, Christian R.; Barrett, Richard F.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Experiences with Sandia National Laboratories HPC applications and MPI performance

Rajan, Mahesh; Doerfler, Douglas W.; Barrett, Richard F.; Stevenson, Joel O.; Agelastos, Anthony M.; Shaw, Ryan; Meyer, Harold E.

Abstract not provided.

More Details

TYPE Conference Poster YEAR 2014

OSTI

Toward Rapid Understanding of Production HPC Applications and Systems

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

DOI OSTI

The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Gentile, Ann C.; Monk, Stephen T.; Ogden, Jeffry B.; Rajan, Mahesh; Stevenson, Joel O.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI DOI

The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications

International Conference for High Performance Computing, Networking, Storage and Analysis, SC

Agelastos, Anthony M.; Allan, Benjamin A.; Brandt, James M.; Cassella, Paul; Enos, Jeremy; Fullop, Joshi; Gentile, Ann C.; Monk, Stephen T.; Naksinehaboon, Nichamon; Ogden, Jeffry B.; Rajan, Mahesh; Showerman, Michael; Stevenson, Joel O.; Taerat, Narate; Tucker, Thomas O.

Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the complex interplay between applications competing for shared resources. To gain new insights, monitoring tools must run continuously, system wide, at frequencies appropriate to the metrics of interest while having minimal impact on application performance. We introduce the Lightweight Distributed Metric Service for scalable, lightweight monitoring of large scale computing systems and applications. We describe issues and constraints guiding deployment in Sandia National Laboratories' capacity computing environment and on the National Center for Supercomputing Applications' Blue Waters platform including motivations, metrics of choice, and requirements relating to the scale and specialized nature of Blue Waters. We address monitoring overhead and impact on application performance and provide illustrative profiling results.

More Details

TYPE Conference Poster YEAR 2014

DOI OSTI Scopus

Unprecedented Scalability and Performance of the new NNSA Tri-Lab Capacity Cluster 2 (TLCC2)

Rajan, Mahesh; Doerfler, Douglas W.; Lin, Paul T.; Hammond, Simon; Barrett, Richard F.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Application Performance and Scaling on the new Tri-Lab Capacity Cluster:TLCC2

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Reliable Computation Using Unpredictable Components

Wilke, Jason; Ballance, Robert A.; Rajan, Mahesh; Kelly, Suzanne M.; Noe, John P.

Abstract not provided.

More Details

TYPE Conference YEAR 2012

OSTI

Application-Driven Analysis of Two Generations of Capability Computing Platforms: The Transition to Multicore Processors

Concurreny and Computation: Practice and Experience

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Barrett, Richard F.; Lin, Paul T.; Pedretti, Kevin; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Journal Article YEAR 2011

OSTI

Application Driven Analysis of Two Generations of Capability Computing Platforms: Purple and Cielo

Rajan, Mahesh; Vaughan, Courtenay T.; Barrett, Richard F.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

From Red Storm to Cielo: Performance Analysis of ASC Simulation Programs Across an Evolution of Multicore Architectures

Parallel Processing Letters

Barrett, Richard F.; Vaughan, Courtenay T.; Rajan, Mahesh; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Journal Article YEAR 2011

OSTI

Application-Driven Acceptance of Cielo an XE6 Petascale Capability Platform

Doerfler, Douglas W.; Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Copy of Application-driven Analysis of Two Generations of Capability Computing Platforms: Purple and Cielo

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Investigating the Impact of the Cielo Cray XT6 Architecture on Scientific Application Codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Application-Driven Accpetance of Cielo an XE6 Petascale Capability Platform

Doerfler, Douglas W.; Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

A Comparison of the Performance Characteristics of Capability and Capacity Class HPC Systems

Doerfler, Douglas W.; Rajan, Mahesh; Epperson, Marcus; Vaughan, Courtenay T.; Pedretti, Kevin; Barrett, Richard F.; Barrett, Brian

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Copy of Investigating the Impact of the Cielo Cray XE6 Architecture on Scientific Application Codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Capability vs Capacity; HPC Systems Application Performance Comparisons: Cielo vs. Red Sky

Doerfler, Douglas W.; Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2011

OSTI

Application-driven analysis of two generations of capability computing platforms :

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin P.

Abstract not provided.

More Details

TYPE Conference YEAR 2011

OSTI

Application-driven Analysis of Two Generations of Capability Computing Platforms: Purple and Cielo

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Lin, Paul T.; Pedretti, Kevin; Barrett, Richard F.; Hemmert, Karl S.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Investigating the impact of the cielo cray XE6 architecture on scientific application codes

Vaughan, Courtenay T.; Rajan, Mahesh; Barrett, Richard F.; Doerfler, Douglas W.; Pedretti, Kevin P.

Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, and supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.

More Details

TYPE Conference YEAR 2010

OSTI

HPC application performance and scaling: Understanding trends and future challenges with application benchmarks on past, present and future tri-lab computing systems

AIP Conference Proceedings

Rajan, Mahesh; Doerfler, Douglas W.

In this paper HPC architectural characteristics and their impact on application performance and scaling are investigated. Performance data gathered over several generations of very large HPC systems like: ASC Red Storm, ASC Purple, and a large InfiniBand cluster - Red Sky, are analyzed. As the number of cache coherent cores and number of NUMA domains at a compute node keeps increasing, we analyze their impact with a few simple benchmarks and several applications. We present bottlenecks and remedies examining production applications. We conclude with preliminary early-hardware performance data from the ASC Cielo, a petaFLOPS class future capability system. © 2010 American Institute of Physics.

More Details

TYPE Conference YEAR 2010

OSTI Scopus

How to Analyze the Performance of Parallel Codes?

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Cielo 6x application acceptance tests

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Application performance on Sandia's Red Sky computer

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Application performance on the tri-lab linux capacity cluster -TLCC

International Journal of Distributed Systems and Technologies

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.; Epperson, Marcus

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This paper examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and Red Storm share similar AMD processors and memory DIMMs. Red Storm however has single socket nodes and custom interconnect. Micro-benchmarks and performance analysis tools help understand the causes for the observed performance differences. Control of processor and memory affinity on TLCC with the numactl utility is shown to result in significant performance gains and is essential to attenuate the detrimental impact of OS interference and cache-coherency overhead. While previous studies have investigated impact of affinity control mostly in the context of small SMP systems, the focus of this paper is on highly parallel MPI applications.

More Details

TYPE Presentation YEAR 2010

OSTI Scopus

HPC top 10 InfiniBand Machine : a 3D Torus IB interconnect on Red Sky

Naegle, John H.; Monk, Stephen T.; Schutt, James A.; Doerfler, Douglas W.; Rajan, Mahesh

This presentation discusses the following topics: (1) Red Sky Background; (2) 3D Torus Interconnect Concepts; (3) Difficulties of Torus in IB; (4) New Routing Code for IB a 3D Torus; (5) Red Sky 3D Torus Implementation; and (6) Managing a Large IB Machine. Computing at Sandia: (1) Capability Computing - Designed for scaling of single large runs, Usually proprietary for maximum performance, and Red Storm is Sandia's current capability machine; (2) Capacity Computing - Computing for the masses, 100s of jobs and 100s of users, Extreme reliability required, Flexibility for changing workload, Thunderbird will be decommissioned this quarter, Red Sky is our future capacity computing platform, and Red Mesa machine for National Renewable Energy Lab. Red Sky main themes are: (1) Cheaper - 5X capacity of Tbird at 2/3 the cost, Substantially cheaper per flop than our last large capacity machine purchase; (2) Leaner - Lower operational costs, Three security environments via modular fabric, Expandable, upgradeable, extensible, and Designed for 6yr. life cycle; and (3) Greener - 15% less power-1/6th power per flop, 40% less water-5M gallons saved annually, 10X better cooling efficiency, and 4x denser footprint.

More Details

TYPE Conference YEAR 2010

OSTI

Copy of Predicting AMD Magny-Cours Performance for a Suite of NNSA/ASC Applications

Doerfler, Douglas W.; Rajan, Mahesh; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

CrayPat and GPROF tools

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

Improving performance via mini-applications

Doerfler, Douglas W.; Crozier, Paul; Edwards, Harold C.; Williams, Alan B.; Rajan, Mahesh; Keiter, Eric R.; Thornquist, Heidi K.

Application performance is determined by a combination of many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, we find that the use of mini-applications - small self-contained proxies for real applications - is an excellent approach for rapidly exploring the parameter space of all these choices. Furthermore, use of mini-applications enriches the interaction between application, library and computer system developers by providing explicit functioning software and concrete performance results that lead to detailed, focused discussions of design trade-offs, algorithm choices and runtime performance issues. In this paper we discuss a collection of mini-applications and demonstrate how we use them to analyze and improve application performance on new and future computer platforms.

More Details

TYPE SAND Report YEAR 2009

DOI OSTI

Recent Experiences on Performance and Scalability of SNL Applications on Red Storm and TLCC

Rajan, Mahesh; Doerfler, Douglas W.; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2009

OSTI

Investigating the balance between capacity and capability workloads across large scale computing platforms

Rajan, Mahesh; Vaughan, Courtenay T.; Doerfler, Douglas W.; Benner, Robert E.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Effect of Noise on All Reduce

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Presentation YEAR 2008

OSTI

Benchmarking Multicore Processors

Doerfler, Douglas W.; Rajan, Mahesh; Pedretti, Kevin

Abstract not provided.

More Details

TYPE Presentation YEAR 2007

OSTI

Investigating the balance between capacity and capability workloads across large scale computing platforms

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Experiences with the Use of CrayPat in Performance Analysis

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Experiences with the use of CrayPat in performance analysis

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Investigating the balance between capacity and capability workloads across large scale computing platforms

Rajan, Mahesh; Vaughan, Courtenay T.

Abstract not provided.

More Details

TYPE Presentation YEAR 2007

OSTI

Supercomputer and cluster performance modeling and analysis efforts:2004-2006

Ang, James A.; Vaughan, Courtenay T.; Barnette, Daniel W.; Benner, Robert E.; Doerfler, Douglas W.; Ganti, Anand; Phelps, Sue C.; Rajan, Mahesh; Stevenson, Joel O.; Scott, Ryan T.

This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

More Details

TYPE SAND Report YEAR 2007

DOI OSTI

Performance analysis in support of capability computing (CC) on red storm/XT3

Rajan, Mahesh

Abstract not provided.

More Details

TYPE Conference YEAR 2007

OSTI

Publications

Search results