Publications

Results 26–48 of 48
Skip to search filters

SNL software manual for the ACS Data Analytics Project

Stearley, Jon S.; Robinson, David G.; Hooper, Russell H.; Stickland, Michael S.; McLendon, William C.; Rodrigues, Arun

In the ACS Data Analytics Project (also known as 'YumYum'), a supercomputer is modeled as a graph of components and dependencies, jobs and faults are simulated, and component fault rates are estimated using the graph structure and job pass/fail outcomes. This report documents the successful completion of all SNL deliverables and tasks, describes the software written by SNL for the project, and presents the data it generates. Readers should understand what the software tools are, how they fit together, and how to use them to reproduce the presented data and additional experiments as desired. The SNL YumYum tools provide the novel simulation and inference capabilities desired by ACS. SNL also developed and implemented a new algorithm, which provides faster estimates, at finer component granularity, on arbitrary directed acyclic graphs.

More Details

Tracking topic birth and death in LDA

Wilson, Andrew T.; Robinson, David G.

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.

More Details

A Model-Based Case for Redundant Computation

Stearley, Jon S.; Robinson, David G.; Ferreira, Kurt

Despite its seemingly nonsensical cost, we show through modeling and simulation that redundant computation merits full consideration as a resilience strategy for next-generation systems. Without revolutionary breakthroughs in failure rates, part counts, or stable-storage bandwidths, it has been shown that the utility of Exascale systems will be crushed by the overheads of traditional checkpoint/restart mechanisms. Alternate resilience strategies must be considered, and redundancy is a proven unrivaled approach in many domains. We develop a distribution-independent model for job interrupts on systems of arbitrary redundancy, adapt Daly’s model for total application runtime, and find that his estimate for optimal checkpoint interval remains valid for redundant systems. We then identify conditions where redundancy is more cost effective than non-redundancy. These are done in the context of the number one supercomputers of the last decade, showing that thorough consideration of redundant computation is timely - if not overdue.

More Details

Assessing the Near-Term Risk of Climate Uncertainty:Interdependencies among the U.S. States

Backus, George A.; Trucano, Timothy G.; Robinson, David G.; Adams, Brian M.; Richards, Elizabeth H.; Siirola, John D.; Boslough, Mark B.; Taylor, Mark A.; Conrad, Stephen H.; Kelic, Andjelka; Roach, Jesse D.; Warren, Drake E.; Ballantine, Marissa D.; Stubblefield, W.A.; Snyder, Lillian A.; Finley, Ray E.; Horschel, Daniel S.; Ehlen, Mark E.; Klise, Geoffrey T.; Malczynski, Leonard A.; Stamber, Kevin L.; Tidwell, Vincent C.; Vargas, Vanessa N.; Zagonel, Aldo A.

Abstract not provided.

Statistical language analysis for automatic exfiltration event detection

Robinson, David G.

This paper discusses the recent development a statistical approach for the automatic identification of anomalous network activity that is characteristic of exfiltration events. This approach is based on the language processing method eferred to as latent dirichlet allocation (LDA). Cyber security experts currently depend heavily on a rule-based framework for initial detection of suspect network events. The application of the rule set typically results in an extensive list of uspect network events that are then further explored manually for suspicious activity. The ability to identify anomalous network events is heavily dependent on the experience of the security personnel wading through the network log. Limitations f this approach are clear: rule-based systems only apply to exfiltration behavior that has previously been observed, and experienced cyber security personnel are rare commodities. Since the new methodology is not a discrete rule-based pproach, it is more difficult for an insider to disguise the exfiltration events. A further benefit is that the methodology provides a risk-based approach that can be implemented in a continuous, dynamic or evolutionary fashion. This permits uspect network activity to be identified early with a quantifiable risk associated with decision making when responding to suspicious activity.

More Details

Methodology assessment and recommendations for the Mars science laboratory launch safety analysis

Bessette, Gregory B.; Lipinski, Ronald J.; Bixler, Nathan E.; Hewson, John C.; Robinson, David G.; Potter, Donald L.; Atcitty, Christopher B.; Dodson, Brian W.; Maclean, Heather J.; Sturgis, Beverly R.

The Department of Energy has assigned to Sandia National Laboratories the responsibility of producing a Safety Analysis Report (SAR) for the plutonium-dioxide fueled Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) proposed to be used in the Mars Science Laboratory (MSL) mission. The National Aeronautic and Space Administration (NASA) is anticipating a launch in fall of 2009, and the SAR will play a critical role in the launch approval process. As in past safety evaluations of MMRTG missions, a wide range of potential accident conditions differing widely in probability and seventy must be considered, and the resulting risk to the public will be presented in the form of probability distribution functions of health effects in terms of latent cancer fatalities. The basic descriptions of accident cases will be provided by NASA in the MSL SAR Databook for the mission, and on the basis of these descriptions, Sandia will apply a variety of sophisticated computational simulation tools to evaluate the potential release of plutonium dioxide, its transport to human populations, and the consequent health effects. The first step in carrying out this project is to evaluate the existing computational analysis tools (computer codes) for suitability to the analysis and, when appropriate, to identify areas where modifications or improvements are warranted. The overall calculation of health risks can be divided into three levels of analysis. Level A involves detailed simulations of the interactions of the MMRTG or its components with the broad range of insults (e.g., shrapnel, blast waves, fires) posed by the various accident environments. There are a number of candidate codes for this level; they are typically high resolution computational simulation tools that capture details of each type of interaction and that can predict damage and plutonium dioxide release for a range of choices of controlling parameters. Level B utilizes these detailed results to study many thousands of possible event sequences and to build up a statistical representation of the releases for each accident case. A code to carry out this process will have to be developed or adapted from previous MMRTG missions. Finally, Level C translates the release (or ''source term'') information from Level B into public risk by applying models for atmospheric transport and the health consequences of exposure to the released plutonium dioxide. A number of candidate codes for this level of analysis are available. This report surveys the range of available codes and tools for each of these levels and makes recommendations for which choices are best for the MSL mission. It also identities areas where improvements to the codes are needed. In some cases a second tier of codes may be identified to provide supporting or clarifying insight about particular issues. The main focus of the methodology assessment is to identify a suite of computational tools that can produce a high quality SAR that can be successfully reviewed by external bodies (such as the Interagency Nuclear Safety Review Panel) on the schedule established by NASA and DOE.

More Details

Impact of distributed energy resources on the reliability of a critical telecommunications facility

Robinson, David G.; Atcitty, Christopher B.; Zuffranieri, Jason Z.

This report documents a probabilistic risk assessment of an existing power supply system at a large telecommunications office. The focus is on characterizing the increase in the reliability of power supply through the use of two alternative power configurations. Telecommunications has been identified by the Department of Homeland Security as a critical infrastructure to the United States. Failures in the power systems supporting major telecommunications service nodes are a main contributor to major telecommunications outages. A logical approach to improve the robustness of telecommunication facilities would be to increase the depth and breadth of technologies available to restore power in the face of power outages. Distributed energy resources such as fuel cells and gas turbines could provide one more onsite electric power source to provide backup power, if batteries and diesel generators fail. The analysis is based on a hierarchical Bayesian approach and focuses on the failure probability associated with each of three possible facility configurations, along with assessment of the uncertainty or confidence level in the probability of failure. A risk-based characterization of final best configuration is presented.

More Details

A Bayesian approach for health monitoring of critical systems

Robinson, David G.; Zuffranieri, Jason Z.

Bayesian medical monitoring is a concept based on using real-time performance-related data to make statistical predictions about a patient's future health. The following paper discusses the fundamentals behind the medical monitoring concept and the application to monitoring the health of nuclear reactors. Necessary assumptions are discussed regarding distributions and failure-rate calculations. A simple example is performed to illustrate the effectiveness of the methods. The methods perform very well for the thirteen subjects in the example, with a clear failure sequence identified for eleven of the subjects.

More Details

Vulnerability of critical infrastructures : identifying critical nodes

Robinson, David G.; Cox, Roger G.

The objective of this research was the development of tools and techniques for the identification of critical nodes within critical infrastructures. These are nodes that, if disrupted through natural events or terrorist action, would cause the most widespread, immediate damage. This research focuses on one particular element of the national infrastructure: the bulk power system. Through the identification of critical elements and the quantification of the consequences of their failure, site-specific vulnerability analyses can be focused at those locations where additional security measures could be effectively implemented. In particular, with appropriate sizing and placement within the grid, distributed generation in the form of regional power parks may reduce or even prevent the impact of widespread network power outages. Even without additional security measures, increased awareness of sensitive power grid locations can provide a basis for more effective national, state and local emergency planning. A number of methods for identifying critical nodes were investigated: small-world (or network theory), polyhedral dynamics, and an artificial intelligence-based search method - particle swarm optimization. PSO was found to be the only viable approach and was applied to a variety of industry accepted test networks to validate the ability of the approach to identify sets of critical nodes. The approach was coded in a software package called Buzzard and integrated with a traditional power flow code. A number of industry accepted test networks were employed to validate the approach. The techniques (and software) are not unique to power grid network, but could be applied to a variety of complex, interacting infrastructures.

More Details

A Modeling Approach for Predicting the Effect of Corrosion on Electrical-Circuit Reliability

Braithwaite, J.W.; Braithwaite, J.W.; Sorensen, Neil R.; Robinson, David G.; Chen, Ken S.; Bogdan, Carolyn W.

An analytical capability is being developed that can be used to predict the effect of corrosion on the performance of electrical circuits and systems. The availability of this ''toolset'' will dramatically improve our ability to influence device and circuit design, address and remediate field occurrences, and determine real limits for circuit service life. In pursuit of this objective, we have defined and adopted an iterative, statistical-based, top-down approach that will permit very formidable and real obstacles related to both the development and use of the toolset to be resolved as effectively as possible. An important component of this approach is the direct incorporation of expert opinion. Some of the complicating factors to be addressed involve the code/model complexity, the existence of large number of possible degradation processes, and an incompatibility between the length scales associated with device dimensions and the corrosion processes. Two of the key aspects of the desired predictive toolset are (1) a direct linkage of an electrical-system performance model with mechanistic-based, deterministic corrosion models, and (2) the explicit incorporation of a computational framework to quantify the effects of non-deterministic parameters (uncertainty). The selected approach and key elements of the toolset are first described in this paper. These descriptions are followed by some examples of how this toolset development process is being implemented.

More Details

A Hierarchial Bayes Approach to System Reliability Analysis

Robinson, David G.

The Comprehensive Test Ban Treaty of 1996 banned any future nuclear explosions or testing of nuclear weapons and created the CTBTO in Vienna to implement the treaty. The U.S. response to this was the cessation of all above and below ground nuclear testing. As such, all stockpile reliability assessments are now based on periodic testing of subsystems being stored in a wide variety of environments. This data provides a wealth of information and feeds a growing web of deterministic, physics-based computer models for assessment of stockpile reliability. Unfortunately until 1996 it was difficult to relate the deterministic materials aging test data to component reliability. Since that time we have made great strides in mathematical techniques and computer tools that permit explicit relationships between materials degradation, e.g. corrosion, thermo-mechanical fatigue, and reliability. The resulting suite of tools is known as CRAX and the mathematical library supporting these tools is Cassandra. However, these techniques ignore the historical data that is also available on similar systems in the nuclear stockpile, the DoD weapons complex and even in commercial applications. Traditional statistical techniques commonly used in classical re liability assessment do not permit data from these sources to be easily included in the overall assessment of system reliability. An older, alternative approach based on Bayesian probability theory permits the inclusion of data from all applicable sources. Data from a variety of sources is brought together in a logical fashion through the repeated application of inductive mathematics. This research brings together existing mathematical methods, modifies and expands those techniques as required, permitting data from a wide variety of sources to be combined in a logical fashion to increase the confidence in the reliability assessment of the nuclear weapons stockpile. The application of this research is limited to those systems composed of discrete components, e.g. those that can be characterized as operating or not operating. However, there is nothing unique about the underlying principles and the extension to continuous subsystem/systems is straightforward. The framework is also laid for the consideration of systems with multiple correlated failure modes. While an important consideration, time and resources limited the specific demonstration of these methods.

More Details

Sensitivity and uncertainty analysis of a polyurethane foam decomposition model

Hobbs, Michael L.; Robinson, David G.

Sensitivity/uncertainty analyses are not commonly performed on complex, finite-element engineering models because the analyses are time consuming, CPU intensive, nontrivial exercises that can lead to deceptive results. To illustrate these ideas, an analytical sensitivity/uncertainty analysis is used to determine the standard deviation and the primary factors affecting the burn velocity of polyurethane foam exposed to firelike radiative boundary conditions. The complex, finite element model has 25 input parameters that include chemistry, polymer structure, and thermophysical properties. The response variable was selected as the steady-state burn velocity calculated as the derivative of the burn front location versus time. The standard deviation of the burn velocity was determined by taking numerical derivatives of the response variable with respect to each of the 25 input parameters. Since the response variable is also a derivative, the standard deviation is essentially determined from a second derivative that is extremely sensitive to numerical noise. To minimize the numerical noise, 50-micron elements and approximately 1-msec time steps were required to obtain stable uncertainty results. The primary effect variable was shown to be the emissivity of the foam.

More Details
Results 26–48 of 48
Results 26–48 of 48