Community Detection for Large Graphs
Abstract not provided.
Abstract not provided.
Journal of Complex Networks
Abstract not provided.
Abstract not provided.
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Abstract not provided.
Statistical Analysis and Data Mining
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimating local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives. © 2014 Wiley Periodicals, Inc.
IEEE Transactions on Power Systems
We consider the problem of designing (or augmenting) an electric power system at a minimum cost such that it satisfies the N-k-survivability criterion. This survivability criterion is a generalization of the well-known N-k criterion, and it requires that at least 1-j fraction of the steady-state demand be met after failures of j components, for j=0,1,,k. The network design problem adds another level of complexity to the notoriously hard contingency analysis problem, since the contingency analysis is only one of the requirements for the design optimization problem. We present a mixed-integer programming formulation of this problem that takes into account both transmission and generation expansion. We propose an algorithm that can avoid combinatorial explosion in the number of contingencies, by seeking vulnerabilities in intermediary solutions and constraining the design space accordingly. Our approach is built on our ability to identify such system vulnerabilities quickly. Our empirical studies on modified instances of the IEEE 30-bus and IEEE 57-bus systems show the effectiveness of our methods. We were able to solve the transmission and generation expansion problems for k=4 in approximately 30 min, while other approaches failed to provide a solution at the end of 2 h. © 2014 IEEE.
Abstract not provided.
Abstract not provided.
ACM Transactions on Knowledge Discovery form Data
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Proposed for publication in arXiv.
Abstract not provided.
Proposed for publication in arXiv.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics
Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such community must contain a dense Erdos-Rényi (ER) subgraph. Based on mathematical arguments, we hypothesize that any graph with a heavy-tailed degree distribution and community structure must contain a scale-free collection of dense ER subgraphs. These theoretical observations corroborate well with empirical evidence. From this, we propose the Block Two-Level Erdos-Rényi (BTER) model, and demonstrate that it accurately captures the observable properties of many real-world social networks. © 2012 American Physical Society.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Abstract not provided.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics, and so allowing graphs with multivariate edges significantly increases modeling power. In this context the clustering problem becomes more challenging. Each edge/metric provides only partial information about the data; recovering full information requires aggregation of all the similarity metrics. We generalize the concept of clustering in single-edge graphs to multi-edged graphs and discuss how this generates a space of clusterings. We describe a meta-clustering structure on this space and propose methods to compactly represent the meta-clustering structure. Experimental results on real and synthetic data are presented. © 2011 Springer-Verlag.
Abstract not provided.
Abstract not provided.