Merging Event Catalogs Using Agglomerative Hierarchical Clustering
Abstract not provided.
Abstract not provided.
Abstract not provided.
The National Nuclear Security Administration is creating a ''Knowledge Base'' to store technical information to support the United States nuclear explosion monitoring mission. This guide is intended to be used by researchers who wish to contribute their work to the ''Knowledge Base''. It provides de.nitions of the kinds of data sets or research products in the ''Knowledge Base'', acceptable data formats, and templates to complete to facilitate the documentation necessary for the ''Knowledge Base''.
The process of developing the National Nuclear Security Administration (NNSA) Knowledge Base (KB) must result in high-quality Information Products in order to support activities for monitoring nuclear explosions consistent with United States treaty and testing moratoria monitoring missions. The validation, verification, and management of the Information Products is critical to successful scientific integration, and hence, will enable high-quality deliveries to be made to the United States National Data Center (USNDC) at the Air Force Technical Applications Center (AFTAC). As an Information Product passes through the steps necessary to become part of a delivery to AFTAC, domain experts (including technical KB Working Groups that comprise NNSA and DOE laboratory staff and the customer) will provide coordination and validation, where validation is the determination of relevance and scientific quality. Verification is the check for completeness and correctness, and will be performed by both the Knowledge Base Integrator and the Scientific Integrator with support from the Contributor providing two levels of testing to assure content integrity and performance. The Information Products and their contained data sets will be systematically tracked through the integration portion of their life cycle. The integration process, based on lessons learned during its initial implementations, is presented in this report.
The National Nuclear Security Administration is creating a Knowledge Base to store technical information to support the United States nuclear explosion monitoring mission. This document defines the core database tables that are used in the Knowledge Base. The purpose of this document is to present the ORACLE database tables in the NNSA Knowledge Base that on modifications to the CSS3.0 Database Schema developed in 1990. (Anderson et al., 1990). These modifications include additional columns to the affiliation table, an increase in the internal ORACLE format from 8 integers to 9 integers for thirteen IDs, and new primary and unique key definitions for six tables. It is intended to be used as a reference by researchers inside and outside of NNSA/DOE as they compile information to submit to the NNSA Knowledge Base. These ''core'' tables are separated into two groups. The Primary tables are dynamic and consist of information that can be used in automatic and interactive processing (e.g. arrivals, locations). The Lookup tables change infrequently and are used for auxiliary information used by the processing. In general, the information stored in the core tables consists of: arrivals; events, origins, associations of arrivals; magnitude information; station information (networks, site descriptions, instrument responses); pointers to waveform data; and comments pertaining to the information. This document is divided into four sections, the first being this introduction. Section two defines the sixteen tables that make up the core tables of the NNSA Knowledge Base database. Both internal (ORACLE) and external formats for the attributes are defined, along with a short description of each attribute. In addition, the primary, unique and foreign keys are defined. Section three of the document shows the relationships between the different tables by using entity-relationship diagrams. The last section, defines the columns or attributes of the various tables. Information that is included is the Not Applicable (NA) value, the format of the data and the applicable range for the attribute.
Abstract not provided.
Event catalogs for seismic data can become very large. Furthermore, as researchers collect multiple catalogs and reconcile them into a single catalog that is stored in a relational database, the reconciled set becomes even larger. The sheer number of these events makes searching for relevant events to compare with events of interest problematic. Information overload in this form can lead to the data sets being under-utilized and/or used incorrectly or inconsistently. Thus, efforts have been initiated to research techniques and strategies for helping researchers to make better use of large data sets. In this paper, the authors present their efforts to do so in two ways: (1) the Event Search Engine, which is a waveform correlation tool and (2) some content analysis tools, which area combination of custom-built and commercial off-the-shelf tools for accessing, managing, and querying seismic data stored in a relational database. The current Event Search Engine is based on a hierarchical clustering tool known as the dendrogram tool, which is written as a MatSeis graphical user interface. The dendrogram tool allows the user to build dendrogram diagrams for a set of waveforms by controlling phase windowing, down-sampling, filtering, enveloping, and the clustering method (e.g. single linkage, complete linkage, flexible method). It also allows the clustering to be based on two or more stations simultaneously, which is important to bridge gaps in the sparsely recorded event sets anticipated in such a large reconciled event set. Current efforts are focusing on tools to help the researcher winnow the clusters defined using the dendrogram tool down to the minimum optimal identification set. This will become critical as the number of reference events in the reconciled event set continually grows. The dendrogram tool is part of the MatSeis analysis package, which is available on the Nuclear Explosion Monitoring Research and Engineering Program Web Site. As part of the research into how to winnow the reference events in these large reconciled event sets, additional database query approaches have been developed to provide windows into these datasets. These custom built content analysis tools help identify dataset characteristics that can potentially aid in providing a basis for comparing similar reference events in these large reconciled event sets. Once these characteristics can be identified, algorithms can be developed to create and add to the reduced set of events used by the Event Search Engine. These content analysis tools have already been useful in providing information on station coverage of the referenced events and basic statistical, information on events in the research datasets. The tools can also provide researchers with a quick way to find interesting and useful events within the research datasets. The tools could also be used as a means to review reference event datasets as part of a dataset delivery verification process. There has also been an effort to explore the usefulness of commercially available web-based software to help with this problem. The advantages of using off-the-shelf software applications, such as Oracle's WebDB, to manipulate, customize and manage research data are being investigated. These types of applications are being examined to provide access to large integrated data sets for regional seismic research in Asia. All of these software tools would provide the researcher with unprecedented power without having to learn the intricacies and complexities of relational database systems.