The ground truth program used simulations as test beds for social science research methods. The simulations had known ground truth and were capable of producing large amounts of data. This allowed research teams to run experiments and ask questions of these simulations similar to social scientists studying real-world systems, and enabled robust evaluation of their causal inference, prediction, and prescription capabilities. We tested three hypotheses about research effectiveness using data from the ground truth program, specifically looking at the influence of complexity, causal understanding, and data collection on performance. We found some evidence that system complexity and causal understanding influenced research performance, but no evidence that data availability contributed. The ground truth program may be the first robust coupling of simulation test beds with an experimental framework capable of teasing out factors that determine the success of social science research.
This report describes research conducted to use data science and machine learning methods to distinguish targeted genome editing versus natural mutation and sequencer machine noise. Genome editing capabilities have been around for more than 20 years, and the efficiencies of these techniques has improved dramatically in the last 5+ years, notably with the rise of CRISPR-Cas technology. Whether or not a specific genome has been the target of an edit is concern for U.S. national security. The research detailed in this report provides first steps to address this concern. A large amount of data is necessary in our research, thus we invested considerable time collecting and processing it. We use an ensemble of decision tree and deep neural network machine learning methods as well as anomaly detection to detect genome edits given either whole exome or genome DNA reads. The edit detection results we obtained with our algorithms tested against samples held out during training of our methods are significantly better than random guessing, achieving high F1 and recall scores as well as with precision overall.
This report presents a framework to evaluate the impact of a high-altitude electromagnetic pulse (HEMP) event on a bulk electric power grid. This report limits itself to modeling the impact of EMP E1 and E3 components. The co-simulation of E1 and E3 is presented in detail, and the focus of the paper is on the framework rather than actual results. This approach is highly conservative as E1 and E3 are not maximized with the same event characteristics and may only slightly overlap. The actual results shown in this report are based on a synthetic grid with synthetic data and a limited exemplary EMP model. The framework presented can be leveraged and used to analyze the impact of other threat scenarios, both manmade and natural disasters. This report d escribes a Monte-Carlo based methodology to probabilistically quantify the transient response of the power grid to a HEMP event. The approach uses multiple fundamental steps to characterize the system response to HEMP events, focused on the E1 and E3 components of the event. 1) Obtain component failure data related to HEMP events testing of components and creating component failure models. Use the component failure model to create component failure conditional probability density function (PDF) that is a function of the HEMP induced terminal voltage. 2) Model HEMP scenarios and calculate the E1 coupled voltage profiles seen by all system components. Model the same HEMP scenarios and calculate the transformer reactive power consumption profiles due to E3. 3) Sample each component failure PDF to determine which grid components will fail, due to the E1 voltage spike, for each scenario. 4) Perform dynamic simulations that incorporate the predicted component failures from E1 and reactive power consumption at each transformer affected by E3. These simulations allow for secondary transients to affect the relays/protection remaining in service which can lead to cascading outages. 5) Identify the locations and amount of load lost for each scenario through grid dynamic simulation. This can be an indication of the immediate grid impacts from a HEMP event. In addition, perform more detailed analysis to determine critical nodes and system trends. 6) To help realize the longer-term impacts, a security constrained alternating current optimal power flow (ACOPF) is run to maximize critical load served. This report describes a modeling framework to assess the systemic grid impacts due to a HEMP event. This stochastic simulation framework generates a large amount of data for each Monte Carlo replication, including HEMP location and characteristics, relay and component failures, E3 GIC profiles, cascading dynamics including voltage and frequency over time, and final system state. This data can then be analyzed to identify trends, e.g., unique system behavior modes or critical components whose failure is more likely to cause serious systemic effects. The proposed analysis process is demonstrated on a representative system. In order to draw realistic conclusions of the impact of a HEMP event on the grid, a significant amount of work remains with respect to modeling the impact on various grid components.