Publications Search

Interactive Data Fusion Capabilities for Large-Scale Compute Cluster Architects and Administrators

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Resource Health Characterizations for Interactive and Autonomous Proactive System Administration and Scheduling Decisions

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Practical reliability and uncertainty quantification in complex systems : final report

Grace, Matthew G.; Red-Horse, John R.; Pebay, Philippe P.; Ringland, James T.; Zurn, Rena M.; Diegert, Kathleen V.

The purpose of this project was to investigate the use of Bayesian methods for the estimation of the reliability of complex systems. The goals were to find methods for dealing with continuous data, rather than simple pass/fail data; to avoid assumptions of specific probability distributions, especially Gaussian, or normal, distributions; to compute not only an estimate of the reliability of the system, but also a measure of the confidence in that estimate; to develop procedures to address time-dependent or aging aspects in such systems, and to use these models and results to derive optimal testing strategies. The system is assumed to be a system of systems, i.e., a system with discrete components that are themselves systems. Furthermore, the system is 'engineered' in the sense that each node is designed to do something and that we have a mathematical description of that process. In the time-dependent case, the assumption is that we have a general, nonlinear, time-dependent function describing the process. The major results of the project are described in this report. In summary, we developed a sophisticated mathematical framework based on modern probability theory and Bayesian analysis. This framework encompasses all aspects of epistemic uncertainty and easily incorporates steady-state and time-dependent systems. Based on Markov chain, Monte Carlo methods, we devised a computational strategy for general probability density estimation in the steady-state case. This enabled us to compute a distribution of the reliability from which many questions, including confidence, could be addressed. We then extended this to the time domain and implemented procedures to estimate the reliability over time, including the use of the method to predict the reliability at a future time. Finally, we used certain aspects of Bayesian decision analysis to create a novel method for determining an optimal testing strategy, e.g., we can estimate the 'best' location to take the next test to minimize the risk of making a wrong decision about the fitness of a system. We conclude this report by proposing additional fruitful areas of research.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI

Practical Reliability and Uncertainty Quantification in Complex Systems

Grace, Matthew G.; Boggs, Paul T.; Pebay, Philippe P.; Red-Horse, John R.; Ringland, James T.; Zurn, Rena M.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Quantifying failure prediction in large scale HPC systems: A case study

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Scalable Information Fusion for Fault Tolerance in Large-Scale HPC

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Quantifying Failure Prediction in Large Scale HPC Systems: A Case Study

Brandt, James M.; Chen, Frank X.; De Sapio, Vincent D.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Copy of ParaView Statistics Engines

Pebay, Philippe P.; Fabian, Nathan D.; Roe, Diana C.; Bennett, Janine C.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Resource Monitoring and Management with OVIS to Enable HPC in Cloud Computing Environments

Brandt, James M.; Wong, Matthew H.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Combining System Characterization and Novel Execution Modles to Achieve Scalable Robust Computing

Adalsteinsson, Helgi A.; Brandt, James M.; Gentile, Ann C.; Debusschere, Bert D.; Mayo, Jackson M.; Pebay, Philippe P.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

OVIS 2.0 user%3CU%2B2019%3Es guide

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

This document describes how to obtain, install, use, and enjoy a better life with OVIS version 2.0. The OVIS project targets scalable, real-time analysis of very large data sets. We characterize the behaviors of elements and aggregations of elements (e.g., across space and time) in data sets in order to detect anomalous behaviors. We are particularly interested in determining anomalous behaviors that can be used as advance indicators of significant events of which notification can be made or upon which action can be taken or invoked. The OVIS open source tool (BSD license) is available for download at ovis.ca.sandia.gov. While we intend for it to support a variety of application domains, the OVIS tool was initially developed for, and continues to be primarily tuned for, the investigation of High Performance Compute (HPC) cluster system health. In this application it is intended to be both a system administrator tool for monitoring and a system engineer tool for exploring the system state in depth. OVIS 2.0 provides a variety of statistical tools for examining the behavior of elements in a cluster (e.g., nodes, racks) and associated resources (e.g., storage appliances and network switches). It calculates and reports model values and outliers relative to those models. Additionally, it provides an interactive 3D physical view in which the cluster elements can be colored by raw element values (e.g., temperatures, memory errors) or by the comparison of those values to a given model. The analysis tools and the visual display allow the user to easily determine abnormal or outlier behaviors. The OVIS project envisions the OVIS tool, when applied to compute cluster monitoring, to be used in conjunction with the scheduler or resource manager in order to enable intelligent resource utilization. For example, nodes that are deemed less healthy, that is, nodes that exhibit outlier behavior in some variable, or set of variables, that has shown to be correlated with future failure, can be discovered and assigned to shorter duration or less important jobs. Further, applications with fault-tolerant capabilities can invoke those mechanisms on demand, based upon notification of a node exhibiting impending failure conditions, rather than performing such mechanisms (e.g. checkpointing) at regular intervals unnecessarily.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI

Dynamic non-overlapping label placement for three-dimensional point-features

Pebay, Philippe P.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Computational algebraic geometry for statistical modeling FY09Q2 progress

Pebay, Philippe P.

This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones in more detail; the next section provides an overview of the project and how the current progress fits into it.

More Details

TYPE SAND Report YEAR 2009

OSTI DOI

Improving the hexahedral quality obtained from streaming mesh refinement

Mayo, Jackson M.; Pebay, Philippe P.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Methodologies for advance warning of compute cluster problems via statistical analysis : a case study

Brandt, James M.; Gentile, Ann C.; Mayo, Jackson M.; Pebay, Philippe P.; Roe, Diana C.; Wong, Matthew H.

Abstract not provided.

More Details

TYPE Conference YEAR 2009

OSTI

Parallel tetrahedral mesh refinement with MOAB

Pebay, Philippe P.

In this report, we present the novel functionality of parallel tetrahedral mesh refinement which we have implemented in MOAB. This report details work done to implement parallel, edge-based, tetrahedral refinement into MOAB. The theoretical basis for this work is contained in [PT04, PT05, TP06] while information on design, performance, and operation specific to MOAB are contained herein. As MOAB is intended mainly for use in pre-processing and simulation (as opposed to the post-processing bent of previous papers), the primary use case is different: rather than refining elements with non-linear basis functions, the goal is to increase the number of degrees of freedom in some region in order to more accurately represent the solution to some system of equations that cannot be solved analytically. Also, MOAB has a unique mesh representation which impacts the algorithm. This introduction contains a brief review of streaming edge-based tetrahedral refinement. The remainder of the report is broken into three sections: design and implementation, performance, and conclusions. Appendix A contains instructions for end users (simulation authors) on how to employ the refiner.

More Details

TYPE SAND Report YEAR 2008

OSTI DOI

Scalable descriptive and correlative statistics with Titan

Pebay, Philippe P.

This report summarizes the existing statistical engines in VTK/Titan and presents the parallel versions thereof which have already been implemented. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.

More Details

TYPE SAND Report YEAR 2008

OSTI DOI

Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments

Pebay, Philippe P.

We present a formula for the pairwise update of arbitrary-order centered statistical moments. This formula is of particular interest to compute such moments in parallel for large-scale, distributed data sets. As a corollary, we indicate a specialization of this formula for incremental updates, of particular interest to streaming implementations. Finally, we provide pairwise and incremental update formulas for the covariance. Centered statistical moments are one of the most widely used tools in descriptive statistics. It is therefore essential for statistical analysis packages that robust and efficient algorithms be devised and implemented. However, robustness and speed of execution, in this context as well as in others, tend to be orthogonal. For instance, it is well known1 that algorithms for calculating centered statistical moments that utilize sum of powers for the sake of execution speed (one-pass algorithms) lead to unacceptable numerical instability.

More Details

TYPE SAND Report YEAR 2008

OSTI DOI

A Scheme for Automatic Local Conformal Refinement of Hexahedral Meshes

Mayo, Jackson M.; Pebay, Philippe P.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

OVIS-2: A Robust Distributed Architecture for Scalable RAS

Brandt, James M.; Gentile, Ann C.; Wong, Matthew H.; Pebay, Philippe P.; Debusschere, Bert D.; Mayo, Jackson M.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

Mesh Generation and Parallel Mesh Generation for Biomedical Applications

Shepherd, Jason F.; Pebay, Philippe P.

Abstract not provided.

More Details

TYPE Conference YEAR 2008

OSTI

The Parallel CUBIT Adaptive Mesh Algorithm Library (pCAMAL) Manual

Pebay, Philippe P.

Abstract not provided.

More Details

TYPE Presentation YEAR 2008

OSTI

An Exodus II specification for handling gauss points

Pebay, Philippe P.; Jortner, Jeffrey N.

This report specifies the way in which Gauss points shall be named and ordered when storing them in an EXODUS II file so that they may be properly interpreted by visualization tools. This naming convention covers hexahedra and tetrahedra. Future revisions of this document will cover quadrilaterals, triangles, and shell elements.

More Details

TYPE SAND Report YEAR 2007

OSTI DOI