Publications Search

Exact results and field-theoretic bounds for randomly advected propagating fronts, and implications for turbulent combustion

One of the authors previously conjectured that the wrinkling of propagating fronts by weak random advection increases the bulk propagation rate (turbulent burning velocity) in proportion to the 4/3 power of the advection strength. An exact derivation of this scaling is reported. The analysis shows that the coefficient of this scaling is equal to the energy density of a lower-dimensional Burgers fluid with a white-in-time forcing whose spatial structure is expressed in terms of the spatial autocorrelation of the flow that advects the front. The replica method of field theory has been used to derive an upper bound on the coefficient as a function of the spatial autocorrelation. High precision numerics show that the bound is usefully sharp. Implications for strongly advected fronts (e.g., turbulent flames) are noted.

More Details

TYPE Conference YEAR 2010

OSTI

Copy of Copy of Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Scalable modeling and analysis for resilience

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Wong, Matthew H.; De Sapio, Vincent D.; Roe, Diana C.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Are there observable precursors to HPC platform failures?

Brandt, James M.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Are there observable precursors to HPC platform resource failures?

Brandt, James M.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

SST/macroscale. The other simulator

Adalsteinsson, Helgi A.; Janssen, Curtis L.; Cranford, Scott C.; Dechev, Damian D.; Evensky, David A.; Kenny, Joseph P.; Mayo, Jackson M.; Pinar, Ali P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2010

OSTI

A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data

De Sapio, Vincent D.; Brandt, James M.; Gentile, Ann C.; Kegelmeyer, William P.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

A Simulator for Large-scale Parallel Computer Architectures

Pinar, Ali P.; Janssen, Curtis L.; Adalsteinsson, Helgi A.; Cranford, Scott C.; Kenny, Joseph P.; Evensky, David A.; Mayo, Jackson M.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Scalable Information Fusion for Fault Tolerance in Large-Scale HPC

Brandt, James M.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Combining Virtualization Resource Characterization and Resource Management to Enable Efficient High Performance Compute Platforms Through Intelligent Dynamic Resource Allocation

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2010

OSTI

Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Methodologies for advance warning of compute cluster problems via statistical analysis: A case study

Proceedings of the 2009 Workshop on Resiliency in High Performance, Resilience'09, Co-located with the 2009 International Symposium on High Performance Distributed Computing Conference, HPDC'09

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pébay, Philippe; Roe, Diana C.; Thompson, David; Wong, Matthew H.

The ability to predict impending failures (hardware or software) on large scale high performance compute (HPC) platforms, augmented by checkpoint mechanisms could drastically increase the scalability of applications and efficiency of platforms. In this paper we present our findings and methodologies employed to date in our search for reliable, advance indicators of failures on a 288 node, 4608 core, Opteron based cluster in production use at Sandia National Laboratories. In support of this effort we have deployed OVIS, a Sandia-developed scalable HPC monitoring, analysis, and visualization tool designed for this purpose. We demonstrate that for a particular error case, statistical analysis using OVIS would enable advanced warning of cluster problems on timescales that would enable application and system administrator response in advance of errors, subsequent system error log reporting, and job failures. This is significant as the utility of detecting such indicators depends on how far in advance of failure they can be recognized and how reliable they are. Copyright 2009 ACM.

More Details

TYPE Conference YEAR 2009

Scopus OSTI

Resource monitoring and management with OVIS to enable HPC in cloud computing environments

IPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pébay, Philippe; Roe, Diana C.; Thompson, David; Wong, Matthew H.

Using the cloud computing paradigm, a host of companies promise to make huge compute resources available to users on a pay-as-you-go basis. These resources can be configured on the fly to provide the hardware and operating system of choice to the customer on a large scale. While the current target market for these resources in the commercial space is web development/hosting, this model has the lure of savings of ownership, operation, and maintenance costs, and thus sounds like an attractive solution for people who currently invest millions to hundreds of millions of dollars annually on High Performance Computing (HPC) platforms in order to support large-scale scientific simulation codes. Given the current interconnect bandwidth and topologies utilized in these commercial offerings, however, the only current viable market in HPC would be small-memoryfootprint embarrassingly parallel or loosely coupled applications, which inherently require little to no inter-processor communication. While providing the appropriate resources (bandwidth, latency, memory, etc.) for the HPC community would increase the potential to enable HPC in cloud environments, this would not address the need for scalability and reliability, crucial to HPC applications. Providing for these needs is particularly difficult in commercial cloud offerings where the number of virtual resources can far outstrip the number of physical resources, the resources are shared among many users, and the resources may be heterogeneous. Advanced resource monitoring, analysis, and configuration tools can help address these issues, since they bring the ability to dynamically provide and respond to information about the platform and application state and would enable more appropriate, efficient, and flexible use of the resources key to enabling HPC. Additionally such tools could be of benefit to non-HPC cloud providers, users, and applications by providing more efficient resource utilization in general. © 2009 IEEE.

More Details

TYPE Conference YEAR 2009

Scopus OSTI

Emulytics: Large-Scale Emulation of Botnets

Mayo, Jackson M.; Armstrong, Robert C.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Data Fusion and Statistical Analysis: Piercing the Darkness of the Black Box

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Interactive Data Fusion Capabilities for Large-Scale Compute Cluster Architects and Administrators

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Approaches for scalable modeling and emulation of cyber systems : LDRD final report

Mayo, Jackson M.; Minnich, Ronald G.; Rudish, Don W.; Armstrong, Robert C.

The goal of this research was to combine theoretical and computational approaches to better understand the potential emergent behaviors of large-scale cyber systems, such as networks of {approx} 10{sup 6} computers. The scale and sophistication of modern computer software, hardware, and deployed networked systems have significantly exceeded the computational research community's ability to understand, model, and predict current and future behaviors. This predictive understanding, however, is critical to the development of new approaches for proactively designing new systems or enhancing existing systems with robustness to current and future cyber threats, including distributed malware such as botnets. We have developed preliminary theoretical and modeling capabilities that can ultimately answer questions such as: How would we reboot the Internet if it were taken down? Can we change network protocols to make them more secure without disrupting existing Internet connectivity and traffic flow? We have begun to address these issues by developing new capabilities for understanding and modeling Internet systems at scale. Specifically, we have addressed the need for scalable network simulation by carrying out emulations of a network with {approx} 10{sup 6} virtualized operating system instances on a high-performance computing cluster - a 'virtual Internet'. We have also explored mappings between previously studied emergent behaviors of complex systems and their potential cyber counterparts. Our results provide foundational capabilities for further research toward understanding the effects of complexity in cyber systems, to allow anticipating and thwarting hackers.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI

Resource Health Characterizations for Interactive and Autonomous Proactive System Administration and Scheduling Decisions

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Quantifying failure prediction in large scale HPC systems: A case study

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Scalable Information Fusion for Fault Tolerance in Large-Scale HPC

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Quantifying Failure Prediction in Large Scale HPC Systems: A Case Study

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Living with Complexity in Cyber Security

Mayo, Jackson M.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Resource Monitoring and Management with OVIS to Enable HPC in Cloud Computing Environments

Brandt, James M.; Wong, Matthew H.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Combining System Characterization and Novel Execution Modles to Achieve Scalable Robust Computing

Adalsteinsson, Helgi A.; Brandt, James M.; Gentile, Ann C.; Debusschere, Bert D.; Mayo, Jackson M.; Pebay, Philippe P.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

OVIS 2.0 user%3CU%2B2019%3Es guide

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

This document describes how to obtain, install, use, and enjoy a better life with OVIS version 2.0. The OVIS project targets scalable, real-time analysis of very large data sets. We characterize the behaviors of elements and aggregations of elements (e.g., across space and time) in data sets in order to detect anomalous behaviors. We are particularly interested in determining anomalous behaviors that can be used as advance indicators of significant events of which notification can be made or upon which action can be taken or invoked. The OVIS open source tool (BSD license) is available for download at ovis.ca.sandia.gov. While we intend for it to support a variety of application domains, the OVIS tool was initially developed for, and continues to be primarily tuned for, the investigation of High Performance Compute (HPC) cluster system health. In this application it is intended to be both a system administrator tool for monitoring and a system engineer tool for exploring the system state in depth. OVIS 2.0 provides a variety of statistical tools for examining the behavior of elements in a cluster (e.g., nodes, racks) and associated resources (e.g., storage appliances and network switches). It calculates and reports model values and outliers relative to those models. Additionally, it provides an interactive 3D physical view in which the cluster elements can be colored by raw element values (e.g., temperatures, memory errors) or by the comparison of those values to a given model. The analysis tools and the visual display allow the user to easily determine abnormal or outlier behaviors. The OVIS project envisions the OVIS tool, when applied to compute cluster monitoring, to be used in conjunction with the scheduler or resource manager in order to enable intelligent resource utilization. For example, nodes that are deemed less healthy, that is, nodes that exhibit outlier behavior in some variable, or set of variables, that has shown to be correlated with future failure, can be discovered and assigned to shorter duration or less important jobs. Further, applications with fault-tolerant capabilities can invoke those mechanisms on demand, based upon notification of a node exhibiting impending failure conditions, rather than performing such mechanisms (e.g. checkpointing) at regular intervals unnecessarily.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI