Publications Search

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

More Details

TYPE SAND Report YEAR 2014

OSTI DOI

Generating Large Graphs for benchmarking

Pinar, Ali P.; Kolda, Tamara G.; Comandur, Seshadhri C.; Plantenga, Todd P.

Abstract not provided.

More Details

TYPE Conference YEAR 2014

OSTI

Wedge sampling for computing clustering coefficients and triangle counts on large graphs

Statistical Analysis and Data Mining

Comandur, Seshadhri C.; Pinar, Ali P.; Kolda, Tamara G.

Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimating local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives. © 2014 Wiley Periodicals, Inc.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI DOI

Contingency-risk informed power system design

IEEE Transactions on Power Systems

Chen, Richard L.; Cohn, Amy; Fan, Neng; Pinar, Ali P.

We consider the problem of designing (or augmenting) an electric power system at a minimum cost such that it satisfies the N-k-survivability criterion. This survivability criterion is a generalization of the well-known N-k criterion, and it requires that at least 1-j fraction of the steady-state demand be met after failures of j components, for j=0,1,,k. The network design problem adds another level of complexity to the notoriously hard contingency analysis problem, since the contingency analysis is only one of the requirements for the design optimization problem. We present a mixed-integer programming formulation of this problem that takes into account both transmission and generation expansion. We propose an algorithm that can avoid combinatorial explosion in the number of contingencies, by seeking vulnerabilities in intermediary solutions and constraining the design space accordingly. Our approach is built on our ability to identify such system vulnerabilities quickly. Our empirical studies on modified instances of the IEEE 30-bus and IEEE 57-bus systems show the effectiveness of our methods. We were able to solve the transmission and generation expansion problems for k=4 in approximately 30 min, while other approaches failed to provide a solution at the end of 2 h. © 2014 IEEE.

More Details

TYPE Journal Article YEAR 2014

Scopus OSTI