Scott Larson Nicoll Levy

Scalable System Software

Author profile picture

Scalable System Software

(505) 844-7292

Sandia National Laboratories, New Mexico
P.O. Box 5800
Albuquerque, NM 87185-1319


I am a Senior Member of Technical Staff in the Scalable System Software department of the Center for Computing Research (CCR). I research system software for next-generation extreme-scale systems. Specifically, I study the impact of system failures, and other sources of performance interference, on the execution of scientific simulations. I am also investigating application performance in power-constrained environments. I earned my Ph.D. from the University of New Mexico, where I worked with Prof. Patrick Bridges in the Scalable Systems Lab. At Sandia, I work with Kurt Ferreira, Patrick Widener and the 9lives research group on improving the resilience and fault tolerance of large-scale parallel systems.


  • Ph.D., Computer Science, University of New Mexico
  • B.S., Electrical Engineering, Cornell University


Keira Haskins, bridges bridges, Kurt Ferreira, Scott Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications Publication ID: 76415

Keira Haskins, Patrick Bridges, Kurt Ferreira, Scott Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications Publication ID: 76416

Kurt Ferreira, Scott Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law Publication ID: 75504

William Marts, Matthew Dosanjh, William Schonbein, Scott Levy, Ryan Grant, Patrick Bridges, (2021). MiniMod: A Modular Miniapplication Benchmarking Framework for HPC Publication ID: 79517

Scott Levy, Kurt Ferreira, (2021). An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation Publication ID: 78565

Stephen Olivier, Ronald Brightwell, Kurt Ferreira, Ryan Grant, Scott Levy, Kevin Pedretti, Andrew Younge, (2021). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime Publication ID: 77902

Ryan Grant, Scott Levy, William Schonbein, (2021). Co-design of System Software for Compute Accelerators and SmartNICs Publication ID: 77227

William Schonbein, Ryan Grant, Scott Levy, Matthew Dosanjh, William Marts, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking Publication ID: 72015

Ryan Grant, Whit Schonbein, Scott Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs Publication ID: 71234

Gary Templet Jr., Matthew Glickman, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Jeff Mauldin, Thomas Otahal, Craig Ulmer, Patrick Widener, Ron Oldfield, (2020). FY20 CSSE L2 Milestone 7186 Publication ID: 74812

Ronald Brightwell, Kurt Ferreira, Ryan Grant, Scott Levy, Gerald Lofstead, Stephen Olivier, Kevin Pedretti, Andrew Younge, Ann Gentile, (2020). ALAMO: Autonomous Lightweight Allocation Management and Optimization Publication ID: 74680

Matthew Dosanjh, Ryan Grant, Nathan Hjelmn, Scott Levy, William Schonbein, (2019). The Upcoming Storm: The Implications of Increasing Core Count on Scalable System Software Publication ID: 69426

Stephen Olivier, Ronald Brightwell, Kevin Pedretti, Andrew Younge, Noah Evans, Scott Levy, Kurt Ferreira, Ryan Grant, (2019). SNL ATDM Software Ecosystem Publication ID: 64200

Matthew Bettencourt, Richard Kramer, Keith Cartwright, Edward Phillips, Curtis Ober, Roger Pawlowski, Matthew Swan, Irina Tezaur, Eric Phipps, Sidafa Conde, Eric Cyr, Craig Ulmer, Todd Kordenbrock, Scott Levy, Gary Templet, Jonathan Hu, Paul Lin, Christian Glusa, Christopher Siefert, Micheal Glass, (2018). ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE Publication ID: 58868

Scott Levy, Kurt Ferreira, Nathan DeBardeleben, Taniya Siddiqua, Vilas Sridharan, Elisabeth Baseman, (2018). Lessons Learned from Errors Observed over the Lifetime of Cielo Publication ID: 63939

Craig Ulmer, Todd Kordenbrock, Margaret Lawson, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gregory Sjaardema, Gary Templet, Lee Ward, Patrick Widener, (2018). SNL ATDM: I/O and Data Management Publication ID: 59268

Margaret Lawson, Gerald Lofstead, Scott Levy, Patrick Widener, Craig Ulmer, Shyamali Mukherjee, Gary Templet, Todd Kordenbrock, (2017). EMPRESS-Extensible Metadata PRovider for Extreme-scale Scientific Simulations Publication ID: 54054

Margaret Lawson, Gerald Lofstead, Scott Levy, Patrick Widener, Craig Ulmer, Shyamali Mukherjee, Gary Templet, Todd Kordenbrock, (2017). EMPRESS?Extensible Metadata PRovider for Extreme-scale Scientific Simulations Publication ID: 54449

Craig Ulmer, Shyamali Mukherjee, Gary Templet, Scott Levy, Gerald Lofstead, Patrick Widener, Todd Kordenbrock, Margaret Lawson, (2017). Faodail: Enabling In Situ Analytics for Next-Generation Systems Publication ID: 54217

Kurt Ferreira, Ryan Grant, Michael Levenhagen, Scott Levy, Taylor Groves, (2017). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design Publication ID: 54225

Rebecca Kreitinger, Scott Levy, Kurt Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis Publication ID: 53562

Rebecca Kreitinger, Scott Levy, Kurt Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis Publication ID: 53563

Craig Ulmer, Ron Oldfield, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gary Templet, Patrick Widener, (2017). ATDM Data Warehouse: Data Management Services for Exascale Computing Publication ID: 58113

Patrick Widener, Kurt Ferreira, Scott Levy, (2017). It’s not the heat it’s the humidity: scheduling resilience activity at scale Publication ID: 56360

Craig Ulmer, Craig Ulmer, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gregory Sjaardema, Gary Templet, Patrick Widener, Ron Oldfield, (2017). ATDM Data Warehouse Publication ID: 53054

Scott Levy, Kurt Ferreira, Patrick Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression Publication ID: 51067

Scott Levy, Kurt Ferreira, Patrick Widener, Patrick Bridges, Oscar Mondragon, (2016). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale Publication ID: 50027

Scott Levy, (2016). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems Publication ID: 41675

Scott Levy, Kurt Ferreira, Patrick Widener, Patrick Bridges, Oscar Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics:Leveraging latent synchronization in MPI collective algorithms Publication ID: 50139

Scott Levy, Kurt Ferreira, Patrick Bridges, (2016). Similarity Engine: Using Content Similarity to Improve Memory Resilience Publication ID: 46804

Scott Levy, Kurt Ferreira, Patrick Bridges, (2015). Similarity Engine: Using Content Similarity to Improve Memory Resilience Publication ID: 43098

Kurt Ferreira, Scott Levy, Patrick Widener, Dorian Arnold, (2014). Using Machine Learning to Optimize Uncoordinated Checkpointing Performance Publication ID: 39111

Kurt Ferreira, Scott Levy, (2014). Exploring the effect of noise on the performance benefit of non-blocking MPI_Allreduce Publication ID: 40799

Kurt Ferreira, Scott Levy, Patrick Widener, (2014). Understanding the Effects of Communication and Coordination on Checkpointing at Scale Publication ID: 40506

Kurt Ferreira, Scott Levy, (2014). Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach Publication ID: 38842

Scott Levy, Kurt Ferreira, Patrick Widener, (2014). Using simulation to evaluate the performance of resilience strategies and process failures Publication ID: 36991

Kurt Ferreira, Patrick Widener, Scott Levy, (2014). Understanding the Effects of Communication on Uncoordinated Checkpointing at Scale Publication ID: 36856

Scott Levy, Kurt Ferreira, (2013). Predicting the Impact of Failure Avoidance on Checkpoint/Restart in Extreme-Scale Systems Publication ID: 36607

Kurt Ferreira, Scott Levy, (2013). Predicting Coordinated and Uncoordinated Checkpoint/Restart Protocol Performance at Extreme Scales Publication ID: 36309

Scott Levy, Kurt Ferreira, Patrick Widener, (2013). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale Publication ID: 35680

Kurt Ferreira, Scott Levy, Ronald Brightwell, (2013). A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration Publication ID: 34214

Kurt Ferreira, Scott Levy, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale Publication ID: 33050

Kurt Ferreira, Scott Levy, (2013). A simulation infrastructure for examining the performance of resilience strategies at scale Publication ID: 33098

Scott Levy, (2013). Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems Publication ID: 32294

Kurt Ferreira, Kevin Pedretti, Scott Levy, (2013). Protect Yourself: Why Your OS Must Protect Against DRAM Failures Publication ID: 31581

Scott Levy, Kurt Ferreira, (2013). Using Unreliable Virtual Hardware to Inject Errors in Extreme-Scale Systems Publication ID: 32166

Kurt Ferreira, Aidan Thompson, Christian Trott, Scott Levy, (2013). An examination of content similarity within the memory of HPC applications Publication ID: 31234

Showing Results. Show More Publications