Scott Larson Nicoll Levy

Scalable System Software

Author profile picture

Scalable System Software

sllevy@sandia.gov

(505) 844-7292

Sandia National Laboratories, New Mexico
P.O. Box 5800
Albuquerque, NM 87185-1319

Biography

I am a Senior Member of Technical Staff in the Scalable System Software department of the Center for Computing Research (CCR). I research system software for next-generation extreme-scale systems. Specifically, I study the impact of system failures, and other sources of performance interference, on the execution of scientific simulations. I am also investigating application performance in power-constrained environments. I earned my Ph.D. from the University of New Mexico, where I worked with Prof. Patrick Bridges in the Scalable Systems Lab. At Sandia, I work with Kurt Ferreira, Patrick Widener and the 9lives research group on improving the resilience and fault tolerance of large-scale parallel systems.

Education

  • Ph.D., Computer Science, University of New Mexico
  • B.S., Electrical Engineering, Cornell University

Publications

Keira Haskins, bridges bridges, Kurt Ferreira, Scott Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications https://www.osti.gov/servlets/purl/1899492 Publication ID: 76415

Keira Haskins, Patrick Bridges, Kurt Ferreira, Scott Levy, (2021). A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications https://www.osti.gov/servlets/purl/1899493 Publication ID: 76416

Kurt Ferreira, Scott Levy, (2021). Characterizing Per-node Memory Failures Using Benford?s Law https://www.osti.gov/servlets/purl/1886179 Publication ID: 75504

William Marts, Matthew Dosanjh, William Schonbein, Scott Levy, Ryan Grant, Patrick Bridges, (2021). MiniMod: A Modular Miniapplication Benchmarking Framework for HPC https://doi.org/10.1109/Cluster48925.2021.00028 Publication ID: 79517

Scott Levy, Kurt Ferreira, (2021). An Initial Examination of the Effect of Container Resource Constraints on Application Perturbation https://doi.org/10.2172/1869756 Publication ID: 78565

Stephen Olivier, Ronald Brightwell, Kurt Ferreira, Ryan Grant, Scott Levy, Kevin Pedretti, Andrew Younge, (2021). SNL ATDM Software Ecosystem Operating Systems and On-Node Runtime https://www.osti.gov/servlets/purl/1861479 Publication ID: 77902

Ryan Grant, Scott Levy, William Schonbein, (2021). Co-design of System Software for Compute Accelerators and SmartNICs https://www.osti.gov/servlets/purl/1847622 Publication ID: 77227

William Schonbein, Ryan Grant, Scott Levy, Matthew Dosanjh, William Marts, (2020). Low-cost MPI Multithreaded Message Matching Benchmarking https://doi.org/10.2172/1882338 Publication ID: 72015

Ryan Grant, Whit Schonbein, Scott Levy, (2020). RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs https://www.osti.gov/servlets/purl/1825981 Publication ID: 71234

Gary Templet Jr., Matthew Glickman, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Jeff Mauldin, Thomas Otahal, Craig Ulmer, Patrick Widener, Ron Oldfield, (2020). FY20 CSSE L2 Milestone 7186 https://www.osti.gov/servlets/purl/1820290 Publication ID: 74812

Ronald Brightwell, Kurt Ferreira, Ryan Grant, Scott Levy, Gerald Lofstead, Stephen Olivier, Kevin Pedretti, Andrew Younge, Ann Gentile, (2020). ALAMO: Autonomous Lightweight Allocation Management and Optimization https://www.osti.gov/servlets/purl/1818044 Publication ID: 74680

Matthew Dosanjh, Ryan Grant, Nathan Hjelmn, Scott Levy, William Schonbein, (2019). The Upcoming Storm: The Implications of Increasing Core Count on Scalable System Software https://www.osti.gov/servlets/purl/1762669 Publication ID: 69426

Stephen Olivier, Ronald Brightwell, Kevin Pedretti, Andrew Younge, Noah Evans, Scott Levy, Kurt Ferreira, Ryan Grant, (2019). SNL ATDM Software Ecosystem https://www.osti.gov/servlets/purl/1583026 Publication ID: 64200

Matthew Bettencourt, Richard Kramer, Keith Cartwright, Edward Phillips, Curtis Ober, Roger Pawlowski, Matthew Swan, Irina Tezaur, Eric Phipps, Sidafa Conde, Eric Cyr, Craig Ulmer, Todd Kordenbrock, Scott Levy, Gary Templet, Jonathan Hu, Paul Lin, Christian Glusa, Christopher Siefert, Micheal Glass, (2018). ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE https://doi.org/10.2172/1493832 Publication ID: 58868

Scott Levy, Kurt Ferreira, Nathan DeBardeleben, Taniya Siddiqua, Vilas Sridharan, Elisabeth Baseman, (2018). Lessons Learned from Errors Observed over the Lifetime of Cielo https://doi.org/10.1109/SC.2018.00046 Publication ID: 63939

Craig Ulmer, Todd Kordenbrock, Margaret Lawson, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gregory Sjaardema, Gary Templet, Lee Ward, Patrick Widener, (2018). SNL ATDM: I/O and Data Management https://www.osti.gov/servlets/purl/1806512 Publication ID: 59268

Margaret Lawson, Gerald Lofstead, Scott Levy, Patrick Widener, Craig Ulmer, Shyamali Mukherjee, Gary Templet, Todd Kordenbrock, (2017). EMPRESS-Extensible Metadata PRovider for Extreme-scale Scientific Simulations https://www.osti.gov/servlets/purl/1481718 Publication ID: 54054

Margaret Lawson, Gerald Lofstead, Scott Levy, Patrick Widener, Craig Ulmer, Shyamali Mukherjee, Gary Templet, Todd Kordenbrock, (2017). EMPRESS?Extensible Metadata PRovider for Extreme-scale Scientific Simulations https://www.osti.gov/servlets/purl/1513597 Publication ID: 54449

Craig Ulmer, Shyamali Mukherjee, Gary Templet, Scott Levy, Gerald Lofstead, Patrick Widener, Todd Kordenbrock, Margaret Lawson, (2017). Faodail: Enabling In Situ Analytics for Next-Generation Systems https://www.osti.gov/servlets/purl/1482474 Publication ID: 54217

Kurt Ferreira, Ryan Grant, Michael Levenhagen, Scott Levy, Taylor Groves, (2017). Hardware MPI Message Matching: Insights into MPI Matching Behavior to Inform Design https://doi.org/10.1002/cpe.5150 Publication ID: 54225

Rebecca Kreitinger, Scott Levy, Kurt Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis https://www.osti.gov/servlets/purl/1478158 Publication ID: 53562

Rebecca Kreitinger, Scott Levy, Kurt Ferreira, Patrick Widener, (2017). Spacehog: Evaluating the costs of dedicating resources to in situ analysis https://www.osti.gov/servlets/purl/1573776 Publication ID: 53563

Craig Ulmer, Ron Oldfield, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gary Templet, Patrick Widener, (2017). ATDM Data Warehouse: Data Management Services for Exascale Computing https://www.osti.gov/servlets/purl/1466487 Publication ID: 58113

Patrick Widener, Kurt Ferreira, Scott Levy, (2017). It’s not the heat it’s the humidity: scheduling resilience activity at scale https://www.osti.gov/servlets/purl/1367189 Publication ID: 56360

Craig Ulmer, Craig Ulmer, Todd Kordenbrock, Scott Levy, Gerald Lofstead, Shyamali Mukherjee, Gregory Sjaardema, Gary Templet, Patrick Widener, Ron Oldfield, (2017). ATDM Data Warehouse https://www.osti.gov/servlets/purl/1427407 Publication ID: 53054

Scott Levy, Kurt Ferreira, Patrick Bridges, (2016). Improving Application Resilience to Memory Errors with Lightweight Compression https://doi.org/10.1109/SC.2016.27 Publication ID: 51067

Scott Levy, Kurt Ferreira, Patrick Widener, Patrick Bridges, Oscar Mondragon, (2016). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale https://doi.org/10.1007/978-3-319-10214-6_5 Publication ID: 50027

Scott Levy, (2016). Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems https://www.osti.gov/biblio/1226922 Publication ID: 41675

Scott Levy, Kurt Ferreira, Patrick Widener, Patrick Bridges, Oscar Mondragon, (2016). How I Learned to Stop Worrying and Love In Situ Analytics:Leveraging latent synchronization in MPI collective algorithms https://www.osti.gov/servlets/purl/1364728 Publication ID: 50139

Scott Levy, Kurt Ferreira, Patrick Bridges, (2016). Similarity Engine: Using Content Similarity to Improve Memory Resilience https://www.osti.gov/servlets/purl/1239385 Publication ID: 46804

Scott Levy, Kurt Ferreira, Patrick Bridges, (2015). Similarity Engine: Using Content Similarity to Improve Memory Resilience https://www.osti.gov/servlets/purl/1530987 Publication ID: 43098

Kurt Ferreira, Scott Levy, Patrick Widener, Dorian Arnold, (2014). Using Machine Learning to Optimize Uncoordinated Checkpointing Performance https://www.osti.gov/servlets/purl/1319751 Publication ID: 39111

Kurt Ferreira, Scott Levy, (2014). Exploring the effect of noise on the performance benefit of non-blocking MPI_Allreduce https://www.osti.gov/servlets/purl/1145671 Publication ID: 40799

Kurt Ferreira, Scott Levy, Patrick Widener, (2014). Understanding the Effects of Communication and Coordination on Checkpointing at Scale https://doi.org/10.1109/SC.2014.77 Publication ID: 40506

Kurt Ferreira, Scott Levy, (2014). Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach https://www.osti.gov/servlets/purl/1141101 Publication ID: 38842

Scott Levy, Kurt Ferreira, Patrick Widener, (2014). Using simulation to evaluate the performance of resilience strategies and process failures https://doi.org/10.2172/1204092 Publication ID: 36991

Kurt Ferreira, Patrick Widener, Scott Levy, (2014). Understanding the Effects of Communication on Uncoordinated Checkpointing at Scale https://www.osti.gov/servlets/purl/1140761 Publication ID: 36856

Scott Levy, Kurt Ferreira, (2013). Predicting the Impact of Failure Avoidance on Checkpoint/Restart in Extreme-Scale Systems https://www.osti.gov/servlets/purl/1118703 Publication ID: 36607

Kurt Ferreira, Scott Levy, (2013). Predicting Coordinated and Uncoordinated Checkpoint/Restart Protocol Performance at Extreme Scales https://www.osti.gov/servlets/purl/1115087 Publication ID: 36309

Scott Levy, Kurt Ferreira, Patrick Widener, (2013). Using Simulation to Evaluate the Performance of Resilience Strategies at Scale https://doi.org/10.1007/978-3-319-10214-6_5 Publication ID: 35680

Kurt Ferreira, Scott Levy, Ronald Brightwell, (2013). A Holistic Approach to Modeling and Simulation for Resilience and Power Configuration https://www.osti.gov/servlets/purl/1111081 Publication ID: 34214

Kurt Ferreira, Scott Levy, (2013). A Simulation Infrastructure for Examining the Performance of Resilience Strategies at Scale https://www.osti.gov/servlets/purl/1078709 Publication ID: 33050

Kurt Ferreira, Scott Levy, (2013). A simulation infrastructure for examining the performance of resilience strategies at scale https://doi.org/10.2172/1088091 Publication ID: 33098

Scott Levy, (2013). Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems https://www.osti.gov/biblio/1064175 Publication ID: 32294

Kurt Ferreira, Kevin Pedretti, Scott Levy, (2013). Protect Yourself: Why Your OS Must Protect Against DRAM Failures https://www.osti.gov/biblio/1062878 Publication ID: 31581

Scott Levy, Kurt Ferreira, (2013). Using Unreliable Virtual Hardware to Inject Errors in Extreme-Scale Systems https://www.osti.gov/biblio/1063319 Publication ID: 32166

Kurt Ferreira, Aidan Thompson, Christian Trott, Scott Levy, (2013). An examination of content similarity within the memory of HPC applications https://doi.org/10.2172/1088105 Publication ID: 31234

Showing Results. Show More Publications