Publications

Results 26–50 of 123
Skip to search filters

The pitfalls of provisioning exascale networks: A trace replay analysis for understanding communication performance

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Kenny, Joseph P.; Sargsyan, Khachik S.; Knight, Samuel K.; Michelogiannakis, George; Wilke, Jeremiah J.

Data movement is considered the main performance concern for exascale, including both on-node memory and off-node network communication. Indeed, many application traces show significant time spent in MPI calls, potentially indicating that faster networks must be provisioned for scalability. However, equating MPI times with network communication delays ignores synchronization delays and software overheads independent of network hardware. Using point-to-point protocol details, we explore the decomposition of MPI time into communication, synchronization and software stack components using architecture simulation. Detailed validation using Bayesian inference is used to identify the sensitivity of performance to specific latency/bandwidth parameters for different network protocols and to quantify associated uncertainties. The inference combined with trace replay shows that synchronization and MPI software stack overhead are at least as important as the network itself in determining time spent in communication routines.

More Details

Compiler-assisted source-to-source skeletonization of application models for system simulation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Wilke, Jeremiah J.; Kenny, Joseph P.; Knight, Samuel K.; Rumley, Sebastien

Performance modeling of networks through simulation requires application endpoint models that inject traffic into the simulation models. Endpoint models today for system-scale studies consist mainly of post-mortem trace replay, but these off-line simulations may lack flexibility and scalability. On-line simulations running so-called skeleton applications run reduced versions of an application that generate traffic that is the same or similar to the full application. These skeleton apps have advantages for flexibility and scalability, but they often must be custom written for the simulator itself. Auto-skeletonization of existing application source code via compiler tools would provide endpoint models with minimal development effort. These source-to-source transformations have been only narrowly explored. We introduce a pragma language and corresponding Clang-driven source-to-source compiler that performs auto-skeletonization based on provided pragma annotations. We describe the compiler toolchain, validate the generated skeletons, and show scalability of the generated simulation models beyond 100Â K endpoints for example MPI applications. Overall, we assert that our proposed auto-skeletonization approach and the flexible skeletons it produces can be an important tool in realizing balanced exascale interconnect designs.

More Details
Results 26–50 of 123
Results 26–50 of 123