Publications

48 Results
Skip to search filters

Adverse Event Prediction Using Graph-Augmented Temporal Analysis: Final Report

Brost, Randolph B.; Carrier, Erin E.; Carroll, Michelle C.; Groth, Katrina M.; Kegelmeyer, William P.; Leung, Vitus J.; Link, Hamilton E.; Patterson, Andrew J.; Phillips, Cynthia A.; Richter, Samuel N.; Robinson, David G.; Staid, Andrea S.; Woodbridge, Diane M.-K.

This report summarizes the work performed under the Sandia LDRD project "Adverse Event Prediction Using Graph-Augmented Temporal Analysis." The goal of the project was to de- velop a method for analyzing multiple time-series data streams to identify precursors provid- ing advance warning of the potential occurrence of events of interest. The proposed approach combined temporal analysis of each data stream with reasoning about relationships between data streams using a geospatial-temporal semantic graph. This class of problems is relevant to several important topics of national interest. In the course of this work we developed new temporal analysis techniques, including temporal analysis using Markov Chain Monte Carlo techniques, temporal shift algorithms to refine forecasts, and a version of Ripley's K-function extended to support temporal precursor identification. This report summarizes the project's major accomplishments, and gathers the abstracts and references for the publication sub- missions and reports that were prepared as part of this work. We then describe work in progress that is not yet ready for publication.

More Details

Computing quality scores and uncertainty for approximate pattern matching in geospatial semantic graphs

Statistical Analysis and Data Mining

Stracuzzi, David J.; Brost, Randolph B.; Phillips, Cynthia A.; Robinson, David G.; Wilson, Alyson G.; Woodbridge, Diane W.

Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. We present a preliminary evaluation of three methods for determining both match quality scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.

More Details

Preliminary Results on Uncertainty Quantification for Pattern Analytics

Stracuzzi, David J.; Brost, Randolph B.; Chen, Maximillian G.; Malinas, Rebecca; Peterson, Matthew G.; Phillips, Cynthia A.; Robinson, David G.; Woodbridge, Diane W.

This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search, and discuss a number of possible improvements for each.

More Details

Appraisal of transport and deformation in shale reservoirs using natural noble gas tracers

Heath, Jason; Kuhlman, Kristopher L.; Robinson, David G.; Bauer, Stephen J.; Gardner, W.P.

This report presents efforts to develop the use of in situ naturally-occurring noble gas tracers to evaluate transport mechanisms and deformation in shale hydrocarbon reservoirs. Noble gases are promising as shale reservoir diagnostic tools due to their sensitivity of transport to: shale pore structure; phase partitioning between groundwater, liquid, and gaseous hydrocarbons; and deformation from hydraulic fracturing. Approximately 1.5-year time-series of wellhead fluid samples were collected from two hydraulically-fractured wells. The noble gas compositions and isotopes suggest a strong signature of atmospheric contribution to the noble gases that mix with deep, old reservoir fluids. Complex mixing and transport of fracturing fluid and reservoir fluids occurs during production. Real-time laboratory measurements were performed on triaxially-deforming shale samples to link deformation behavior, transport, and gas tracer signatures. Finally, we present improved methods for production forecasts that borrow statistical strength from production data of nearby wells to reduce uncertainty in the forecasts.

More Details

Arctic Climate Systems Analysis

Ivey, Mark D.; Robinson, David G.; Boslough, Mark B.; Backus, George A.; Peterson, Kara J.; van Bloemen Waanders, Bart G.; Swiler, Laura P.; Desilets, Darin M.; Reinert, Rhonda K.

This study began with a challenge from program area managers at Sandia National Laboratories to technical staff in the energy, climate, and infrastructure security areas: apply a systems-level perspective to existing science and technology program areas in order to determine technology gaps, identify new technical capabilities at Sandia that could be applied to these areas, and identify opportunities for innovation. The Arctic was selected as one of these areas for systems level analyses, and this report documents the results. In this study, an emphasis was placed on the arctic atmosphere since Sandia has been active in atmospheric research in the Arctic since 1997. This study begins with a discussion of the challenges and benefits of analyzing the Arctic as a system. It goes on to discuss current and future needs of the defense, scientific, energy, and intelligence communities for more comprehensive data products related to the Arctic; assess the current state of atmospheric measurement resources available for the Arctic; and explain how the capabilities at Sandia National Laboratories can be used to address the identified technological, data, and modeling needs of the defense, scientific, energy, and intelligence communities for Arctic support.

More Details

Tissue Classification

Robinson, David G.

The project began as a e ort to support InLight and Lumidigm. With the sale of the companies to a non-New Mexico entity, the project then focused on supporting a new company Medici Technologies. The Small Business (SB) is attempting to quantify glucose in tissue using a series of short interferometer scans of the nger. Each scan is produced from a novel presentation of the nger to the device. The intent of the project is to identify and, if possible, implement improved methods for classi cation, feature selection, and training to improve the performance of predictive algorithms used for tissue classi cation.

More Details

Statistically significant relational data mining :

Berry, Jonathan W.; Leung, Vitus J.; Phillips, Cynthia A.; Pinar, Ali P.; Robinson, David G.

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

More Details

Development of a statistically based access delay timeline methodology

Rivera, Wayne G.; Robinson, David G.; Wyss, Gregory D.; Hendrickson, Stacey M.

The charter for adversarial delay is to hinder access to critical resources through the use of physical systems increasing an adversarys task time. The traditional method for characterizing access delay has been a simple model focused on accumulating times required to complete each task with little regard to uncertainty, complexity, or decreased efficiency associated with multiple sequential tasks or stress. The delay associated with any given barrier or path is further discounted to worst-case, and often unrealistic, times based on a high-level adversary, resulting in a highly conservative calculation of total delay. This leads to delay systems that require significant funding and personnel resources in order to defend against the assumed threat, which for many sites and applications becomes cost prohibitive. A new methodology has been developed that considers the uncertainties inherent in the problem to develop a realistic timeline distribution for a given adversary path. This new methodology incorporates advanced Bayesian statistical theory and methodologies, taking into account small sample size, expert judgment, human factors and threat uncertainty. The result is an algorithm that can calculate a probability distribution function of delay times directly related to system risk. Through further analysis, the access delay analyst or end user can use the results in making informed decisions while weighing benefits against risks, ultimately resulting in greater system effectiveness with lower cost.

More Details

SNL software manual for the ACS Data Analytics Project

Stearley, Jon S.; Robinson, David G.; Hooper, Russell H.; Stickland, Michael S.; McLendon, William C.; Rodrigues, Arun

In the ACS Data Analytics Project (also known as 'YumYum'), a supercomputer is modeled as a graph of components and dependencies, jobs and faults are simulated, and component fault rates are estimated using the graph structure and job pass/fail outcomes. This report documents the successful completion of all SNL deliverables and tasks, describes the software written by SNL for the project, and presents the data it generates. Readers should understand what the software tools are, how they fit together, and how to use them to reproduce the presented data and additional experiments as desired. The SNL YumYum tools provide the novel simulation and inference capabilities desired by ACS. SNL also developed and implemented a new algorithm, which provides faster estimates, at finer component granularity, on arbitrary directed acyclic graphs.

More Details

Tracking topic birth and death in LDA

Wilson, Andrew T.; Robinson, David G.

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms such as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.

More Details

A Model-Based Case for Redundant Computation

Stearley, Jon S.; Robinson, David G.; Ferreira, Kurt

Despite its seemingly nonsensical cost, we show through modeling and simulation that redundant computation merits full consideration as a resilience strategy for next-generation systems. Without revolutionary breakthroughs in failure rates, part counts, or stable-storage bandwidths, it has been shown that the utility of Exascale systems will be crushed by the overheads of traditional checkpoint/restart mechanisms. Alternate resilience strategies must be considered, and redundancy is a proven unrivaled approach in many domains. We develop a distribution-independent model for job interrupts on systems of arbitrary redundancy, adapt Daly’s model for total application runtime, and find that his estimate for optimal checkpoint interval remains valid for redundant systems. We then identify conditions where redundancy is more cost effective than non-redundancy. These are done in the context of the number one supercomputers of the last decade, showing that thorough consideration of redundant computation is timely - if not overdue.

More Details

Assessing the Near-Term Risk of Climate Uncertainty:Interdependencies among the U.S. States

Backus, George A.; Trucano, Timothy G.; Robinson, David G.; Adams, Brian M.; Richards, Elizabeth H.; Siirola, John D.; Boslough, Mark B.; Taylor, Mark A.; Conrad, Stephen H.; Kelic, Andjelka; Roach, Jesse D.; Warren, Drake E.; Ballantine, Marissa D.; Stubblefield, W.A.; Snyder, Lillian A.; Finley, Ray E.; Horschel, Daniel S.; Ehlen, Mark E.; Klise, Geoffrey T.; Malczynski, Leonard A.; Stamber, Kevin L.; Tidwell, Vincent C.; Vargas, Vanessa N.; Zagonel, Aldo A.

Abstract not provided.

Statistical language analysis for automatic exfiltration event detection

Robinson, David G.

This paper discusses the recent development a statistical approach for the automatic identification of anomalous network activity that is characteristic of exfiltration events. This approach is based on the language processing method eferred to as latent dirichlet allocation (LDA). Cyber security experts currently depend heavily on a rule-based framework for initial detection of suspect network events. The application of the rule set typically results in an extensive list of uspect network events that are then further explored manually for suspicious activity. The ability to identify anomalous network events is heavily dependent on the experience of the security personnel wading through the network log. Limitations f this approach are clear: rule-based systems only apply to exfiltration behavior that has previously been observed, and experienced cyber security personnel are rare commodities. Since the new methodology is not a discrete rule-based pproach, it is more difficult for an insider to disguise the exfiltration events. A further benefit is that the methodology provides a risk-based approach that can be implemented in a continuous, dynamic or evolutionary fashion. This permits uspect network activity to be identified early with a quantifiable risk associated with decision making when responding to suspicious activity.

More Details

Methodology assessment and recommendations for the Mars science laboratory launch safety analysis

Bessette, Gregory B.; Lipinski, Ronald J.; Bixler, Nathan E.; Hewson, John C.; Robinson, David G.; Potter, Donald L.; Atcitty, Christopher B.; Dodson, Brian W.; Maclean, Heather J.; Sturgis, Beverly R.

The Department of Energy has assigned to Sandia National Laboratories the responsibility of producing a Safety Analysis Report (SAR) for the plutonium-dioxide fueled Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) proposed to be used in the Mars Science Laboratory (MSL) mission. The National Aeronautic and Space Administration (NASA) is anticipating a launch in fall of 2009, and the SAR will play a critical role in the launch approval process. As in past safety evaluations of MMRTG missions, a wide range of potential accident conditions differing widely in probability and seventy must be considered, and the resulting risk to the public will be presented in the form of probability distribution functions of health effects in terms of latent cancer fatalities. The basic descriptions of accident cases will be provided by NASA in the MSL SAR Databook for the mission, and on the basis of these descriptions, Sandia will apply a variety of sophisticated computational simulation tools to evaluate the potential release of plutonium dioxide, its transport to human populations, and the consequent health effects. The first step in carrying out this project is to evaluate the existing computational analysis tools (computer codes) for suitability to the analysis and, when appropriate, to identify areas where modifications or improvements are warranted. The overall calculation of health risks can be divided into three levels of analysis. Level A involves detailed simulations of the interactions of the MMRTG or its components with the broad range of insults (e.g., shrapnel, blast waves, fires) posed by the various accident environments. There are a number of candidate codes for this level; they are typically high resolution computational simulation tools that capture details of each type of interaction and that can predict damage and plutonium dioxide release for a range of choices of controlling parameters. Level B utilizes these detailed results to study many thousands of possible event sequences and to build up a statistical representation of the releases for each accident case. A code to carry out this process will have to be developed or adapted from previous MMRTG missions. Finally, Level C translates the release (or ''source term'') information from Level B into public risk by applying models for atmospheric transport and the health consequences of exposure to the released plutonium dioxide. A number of candidate codes for this level of analysis are available. This report surveys the range of available codes and tools for each of these levels and makes recommendations for which choices are best for the MSL mission. It also identities areas where improvements to the codes are needed. In some cases a second tier of codes may be identified to provide supporting or clarifying insight about particular issues. The main focus of the methodology assessment is to identify a suite of computational tools that can produce a high quality SAR that can be successfully reviewed by external bodies (such as the Interagency Nuclear Safety Review Panel) on the schedule established by NASA and DOE.

More Details

Impact of distributed energy resources on the reliability of a critical telecommunications facility

Robinson, David G.; Atcitty, Christopher B.; Zuffranieri, Jason Z.

This report documents a probabilistic risk assessment of an existing power supply system at a large telecommunications office. The focus is on characterizing the increase in the reliability of power supply through the use of two alternative power configurations. Telecommunications has been identified by the Department of Homeland Security as a critical infrastructure to the United States. Failures in the power systems supporting major telecommunications service nodes are a main contributor to major telecommunications outages. A logical approach to improve the robustness of telecommunication facilities would be to increase the depth and breadth of technologies available to restore power in the face of power outages. Distributed energy resources such as fuel cells and gas turbines could provide one more onsite electric power source to provide backup power, if batteries and diesel generators fail. The analysis is based on a hierarchical Bayesian approach and focuses on the failure probability associated with each of three possible facility configurations, along with assessment of the uncertainty or confidence level in the probability of failure. A risk-based characterization of final best configuration is presented.

More Details

A Bayesian approach for health monitoring of critical systems

Robinson, David G.; Zuffranieri, Jason Z.

Bayesian medical monitoring is a concept based on using real-time performance-related data to make statistical predictions about a patient's future health. The following paper discusses the fundamentals behind the medical monitoring concept and the application to monitoring the health of nuclear reactors. Necessary assumptions are discussed regarding distributions and failure-rate calculations. A simple example is performed to illustrate the effectiveness of the methods. The methods perform very well for the thirteen subjects in the example, with a clear failure sequence identified for eleven of the subjects.

More Details

Vulnerability of critical infrastructures : identifying critical nodes

Robinson, David G.; Cox, Roger G.

The objective of this research was the development of tools and techniques for the identification of critical nodes within critical infrastructures. These are nodes that, if disrupted through natural events or terrorist action, would cause the most widespread, immediate damage. This research focuses on one particular element of the national infrastructure: the bulk power system. Through the identification of critical elements and the quantification of the consequences of their failure, site-specific vulnerability analyses can be focused at those locations where additional security measures could be effectively implemented. In particular, with appropriate sizing and placement within the grid, distributed generation in the form of regional power parks may reduce or even prevent the impact of widespread network power outages. Even without additional security measures, increased awareness of sensitive power grid locations can provide a basis for more effective national, state and local emergency planning. A number of methods for identifying critical nodes were investigated: small-world (or network theory), polyhedral dynamics, and an artificial intelligence-based search method - particle swarm optimization. PSO was found to be the only viable approach and was applied to a variety of industry accepted test networks to validate the ability of the approach to identify sets of critical nodes. The approach was coded in a software package called Buzzard and integrated with a traditional power flow code. A number of industry accepted test networks were employed to validate the approach. The techniques (and software) are not unique to power grid network, but could be applied to a variety of complex, interacting infrastructures.

More Details

A Modeling Approach for Predicting the Effect of Corrosion on Electrical-Circuit Reliability

Braithwaite, J.W.; Braithwaite, J.W.; Sorensen, Neil R.; Robinson, David G.; Chen, Ken S.; Bogdan, Carolyn W.

An analytical capability is being developed that can be used to predict the effect of corrosion on the performance of electrical circuits and systems. The availability of this ''toolset'' will dramatically improve our ability to influence device and circuit design, address and remediate field occurrences, and determine real limits for circuit service life. In pursuit of this objective, we have defined and adopted an iterative, statistical-based, top-down approach that will permit very formidable and real obstacles related to both the development and use of the toolset to be resolved as effectively as possible. An important component of this approach is the direct incorporation of expert opinion. Some of the complicating factors to be addressed involve the code/model complexity, the existence of large number of possible degradation processes, and an incompatibility between the length scales associated with device dimensions and the corrosion processes. Two of the key aspects of the desired predictive toolset are (1) a direct linkage of an electrical-system performance model with mechanistic-based, deterministic corrosion models, and (2) the explicit incorporation of a computational framework to quantify the effects of non-deterministic parameters (uncertainty). The selected approach and key elements of the toolset are first described in this paper. These descriptions are followed by some examples of how this toolset development process is being implemented.

More Details

A Hierarchial Bayes Approach to System Reliability Analysis

Robinson, David G.

The Comprehensive Test Ban Treaty of 1996 banned any future nuclear explosions or testing of nuclear weapons and created the CTBTO in Vienna to implement the treaty. The U.S. response to this was the cessation of all above and below ground nuclear testing. As such, all stockpile reliability assessments are now based on periodic testing of subsystems being stored in a wide variety of environments. This data provides a wealth of information and feeds a growing web of deterministic, physics-based computer models for assessment of stockpile reliability. Unfortunately until 1996 it was difficult to relate the deterministic materials aging test data to component reliability. Since that time we have made great strides in mathematical techniques and computer tools that permit explicit relationships between materials degradation, e.g. corrosion, thermo-mechanical fatigue, and reliability. The resulting suite of tools is known as CRAX and the mathematical library supporting these tools is Cassandra. However, these techniques ignore the historical data that is also available on similar systems in the nuclear stockpile, the DoD weapons complex and even in commercial applications. Traditional statistical techniques commonly used in classical re liability assessment do not permit data from these sources to be easily included in the overall assessment of system reliability. An older, alternative approach based on Bayesian probability theory permits the inclusion of data from all applicable sources. Data from a variety of sources is brought together in a logical fashion through the repeated application of inductive mathematics. This research brings together existing mathematical methods, modifies and expands those techniques as required, permitting data from a wide variety of sources to be combined in a logical fashion to increase the confidence in the reliability assessment of the nuclear weapons stockpile. The application of this research is limited to those systems composed of discrete components, e.g. those that can be characterized as operating or not operating. However, there is nothing unique about the underlying principles and the extension to continuous subsystem/systems is straightforward. The framework is also laid for the consideration of systems with multiple correlated failure modes. While an important consideration, time and resources limited the specific demonstration of these methods.

More Details

Sensitivity and uncertainty analysis of a polyurethane foam decomposition model

Hobbs, Michael L.; Robinson, David G.

Sensitivity/uncertainty analyses are not commonly performed on complex, finite-element engineering models because the analyses are time consuming, CPU intensive, nontrivial exercises that can lead to deceptive results. To illustrate these ideas, an analytical sensitivity/uncertainty analysis is used to determine the standard deviation and the primary factors affecting the burn velocity of polyurethane foam exposed to firelike radiative boundary conditions. The complex, finite element model has 25 input parameters that include chemistry, polymer structure, and thermophysical properties. The response variable was selected as the steady-state burn velocity calculated as the derivative of the burn front location versus time. The standard deviation of the burn velocity was determined by taking numerical derivatives of the response variable with respect to each of the 25 input parameters. Since the response variable is also a derivative, the standard deviation is essentially determined from a second derivative that is extremely sensitive to numerical noise. To minimize the numerical noise, 50-micron elements and approximately 1-msec time steps were required to obtain stable uncertainty results. The primary effect variable was shown to be the emissivity of the foam.

More Details
48 Results
48 Results