This report summarizes the work performed under the Sandia LDRD project "Adverse Event Prediction Using Graph-Augmented Temporal Analysis." The goal of the project was to de- velop a method for analyzing multiple time-series data streams to identify precursors provid- ing advance warning of the potential occurrence of events of interest. The proposed approach combined temporal analysis of each data stream with reasoning about relationships between data streams using a geospatial-temporal semantic graph. This class of problems is relevant to several important topics of national interest. In the course of this work we developed new temporal analysis techniques, including temporal analysis using Markov Chain Monte Carlo techniques, temporal shift algorithms to refine forecasts, and a version of Ripley's K-function extended to support temporal precursor identification. This report summarizes the project's major accomplishments, and gathers the abstracts and references for the publication sub- missions and reports that were prepared as part of this work. We then describe work in progress that is not yet ready for publication.
Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. We present a preliminary evaluation of three methods for determining both match quality scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data.
This report summarizes preliminary research into uncertainty quantification for pattern ana- lytics within the context of the Pattern Analytics to Support High-Performance Exploitation and Reasoning (PANTHER) project. The primary focus of PANTHER was to make large quantities of remote sensing data searchable by analysts. The work described in this re- port adds nuance to both the initial data preparation steps and the search process. Search queries are transformed from does the specified pattern exist in the data? to how certain is the system that the returned results match the query? We show example results for both data processing and search, and discuss a number of possible improvements for each.
This report presents efforts to develop the use of in situ naturally-occurring noble gas tracers to evaluate transport mechanisms and deformation in shale hydrocarbon reservoirs. Noble gases are promising as shale reservoir diagnostic tools due to their sensitivity of transport to: shale pore structure; phase partitioning between groundwater, liquid, and gaseous hydrocarbons; and deformation from hydraulic fracturing. Approximately 1.5-year time-series of wellhead fluid samples were collected from two hydraulically-fractured wells. The noble gas compositions and isotopes suggest a strong signature of atmospheric contribution to the noble gases that mix with deep, old reservoir fluids. Complex mixing and transport of fracturing fluid and reservoir fluids occurs during production. Real-time laboratory measurements were performed on triaxially-deforming shale samples to link deformation behavior, transport, and gas tracer signatures. Finally, we present improved methods for production forecasts that borrow statistical strength from production data of nearby wells to reduce uncertainty in the forecasts.
This study began with a challenge from program area managers at Sandia National Laboratories to technical staff in the energy, climate, and infrastructure security areas: apply a systems-level perspective to existing science and technology program areas in order to determine technology gaps, identify new technical capabilities at Sandia that could be applied to these areas, and identify opportunities for innovation. The Arctic was selected as one of these areas for systems level analyses, and this report documents the results. In this study, an emphasis was placed on the arctic atmosphere since Sandia has been active in atmospheric research in the Arctic since 1997. This study begins with a discussion of the challenges and benefits of analyzing the Arctic as a system. It goes on to discuss current and future needs of the defense, scientific, energy, and intelligence communities for more comprehensive data products related to the Arctic; assess the current state of atmospheric measurement resources available for the Arctic; and explain how the capabilities at Sandia National Laboratories can be used to address the identified technological, data, and modeling needs of the defense, scientific, energy, and intelligence communities for Arctic support.
The project began as a e ort to support InLight and Lumidigm. With the sale of the companies to a non-New Mexico entity, the project then focused on supporting a new company Medici Technologies. The Small Business (SB) is attempting to quantify glucose in tissue using a series of short interferometer scans of the nger. Each scan is produced from a novel presentation of the nger to the device. The intent of the project is to identify and, if possible, implement improved methods for classi cation, feature selection, and training to improve the performance of predictive algorithms used for tissue classi cation.
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
The charter for adversarial delay is to hinder access to critical resources through the use of physical systems increasing an adversarys task time. The traditional method for characterizing access delay has been a simple model focused on accumulating times required to complete each task with little regard to uncertainty, complexity, or decreased efficiency associated with multiple sequential tasks or stress. The delay associated with any given barrier or path is further discounted to worst-case, and often unrealistic, times based on a high-level adversary, resulting in a highly conservative calculation of total delay. This leads to delay systems that require significant funding and personnel resources in order to defend against the assumed threat, which for many sites and applications becomes cost prohibitive. A new methodology has been developed that considers the uncertainties inherent in the problem to develop a realistic timeline distribution for a given adversary path. This new methodology incorporates advanced Bayesian statistical theory and methodologies, taking into account small sample size, expert judgment, human factors and threat uncertainty. The result is an algorithm that can calculate a probability distribution function of delay times directly related to system risk. Through further analysis, the access delay analyst or end user can use the results in making informed decisions while weighing benefits against risks, ultimately resulting in greater system effectiveness with lower cost.