Publications

68 Results
Skip to search filters

A Mountaintop View Requires Minimal Sorting: A Faster Contour Tree Algorithm

Comandur, Seshadhri C.; Raichel, Benjamin A.

Consider a scalar field f : M → R, where M is a triangulated simplicial mesh in Rd. A level set, or contour, at value v is a connected component of f–1 (v). As v is changed, these contours change topology, merge into each other, or split. Contour trees are concise representations of f that track this contour behavior. The vertices of these trees are the critical points of f, where the gradient is zero. The edges represent changes in the topology of contours. It is a fundamental data structure in data analysis and visualization, and there is significant previous work (both theoretical and practical) on algorithms for constructing contour trees. Suppose M has n vertices, N facets, and t critical points. A classic result of Carr, Snoeyink, and Axen (2000) gives an algorithm that takes O(n log n+Nα(N)) time (where α(·) is the inverse Ackermann function). A further improvement to O(t log t + N) time was given by Chiang et al. All these algorithms involve a global sort of the critical points, a significant computational bottleneck. Unfortunately, lower bounds of Ω(t log t) also exist. We present the first algorithm that can avoid the global sort and has a refined time complexity that depends on the contour tree structure. Intuitively, if the tree is short and fat, we get significant improvements in running time. For a partition of the contour tree into a set of descending paths, P, our algorithm runs in O($\Sigma$pϵP |p| log |p| + tα(t) + N). This is at most O(t log D + N), where D is the diameter of the contour tree. Moreover, it is O(tα(t) + N) for balanced trees, a significant improvement over the previous complexity. Our algorithm requires numerous ideas: partitioning the contour tree into join and split trees, a local growing procedure to iteratively build contour trees, and the use of heavy path decompositions for the time complexity analysis. There is a crucial use of a family of binomial heaps to maintain priorities, ensuring that any comparison made is between comparable nodes of the contour tree. We also prove lower bounds showing that the $\Sigma$pϵP |p| log |p| complexity is inherent to computing contour trees.

More Details

Directed closure measures for networks with reciprocity

Journal of Complex Networks

Comandur, Seshadhri C.; Pinar, Ali P.; Durak, Nurcan; Kolda, Tamara G.

The study of triangles in graphs is a standard tool in network analysis, leading to measures such as the transitivity, i.e., the fraction of paths of length two that participate in triangles. Real-world networks are often directed, and it can be difficult to meaningfully understand this network structure. We propose a collection of directed closure values for measuring triangles in directed graphs in a way that is analogous to transitivity in an undirected graph. Our study of these values reveals much information about directed triadic closure. For instance, we immediately see that reciprocal edges have a high propensity to participate in triangles. We also observe striking similarities between the triadic closure patterns of different web and social networks. We perform mathematical and empirical analysis showing that directed configuration models that preserve reciprocity cannot capture the triadic closure patterns of real networks.

More Details

Why do simple algorithms for triangle enumeration work in the real world?

Internet Mathematics

Berry, Jonathan W.; Fostvedt, Luke A.; Nordman, Daniel J.; Phillips, Cynthia A.; Comandur, Seshadhri C.; Wilson, Alyson G.

Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time? We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length 2, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length 2 in which the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm. We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the l4/3-norm of the degree sequence. Norms of the degree sequence are a measure of the heaviness of the tail, and it is precisely this feature that allows non trivial speedups of simple triangle enumeration algorithms. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent α ≥ 7/3, and non trivial speedup whenever α ∈ (2, 3).

More Details

Property testing on product distributions: Optimal testers for bounded derivative properties

Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Chakrabarty, Deeparnab; Dixit, Kashyap; Jha, Madhav; Comandur, Seshadhri C.

The primary problem in property testing is to decide whether a given function satisfies a certain property, or is far from any function satisfying it. This crucially requires a notion of distance between functions. The most prevalent notion is the Hamming distance over the uniform distribution on the domain. This restriction to uniformity is rather limiting, and it is important to investigate distances induced by more general distributions. In this paper, we give simple and optimal testers for bounded derivative properties over arbitrary product distributions. Bounded derivative properties include fundamental properties such as monotonicity and Lipschitz continuity. Our results subsume almost all known results (upper and lower bounds) on monotonicity and Lipschitz testing. We prove an intimate connection between bounded derivative property testing and binary search trees (BSTs). We exhibit a tester whose query complexity is the sum of expected depths of optimal BSTs for each marginal. Furthermore, we show this sum-of-depths is also a lower bound. A technical contribution of our work is an optimal dimension reduction theorem for all bounded derivative properties, which relates the distance of a function from the property to the distance of restrictions of the function to random lines. Such a theorem has been elusive even for monotonicity, and our theorem is an exponential improvement to the previous best known result.

More Details

Finding Hierarchical and Overlapping Dense Subgraphs using Nucleus Decompositions

Comandur, Seshadhri C.; Pinar, Ali P.; Sariyuce, Ahmet E.; Catalyurek, Umit V.

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to nd the \true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine a hierarchical structure among them. Current dense subgraph nding algorithms usually optimize some objective, and only nd a few such subgraphs without providing any hierarchy. It is also not clear how to account for overlaps in dense substructures. We de ne the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which allows for discovery of overlapping dense subgraphs. With the right parameters, the nuclear decomposition generalizes the classic notions of k-cores and k-trusses. We give provable e cient algorithms for nuclear decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-theart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour.

More Details

Wedge sampling for computing clustering coefficients and triangle counts on large graphs

Statistical Analysis and Data Mining

Comandur, Seshadhri C.; Pinar, Ali P.; Kolda, Tamara G.

Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimating local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives. © 2014 Wiley Periodicals, Inc.

More Details

FAST-PPR: Scaling personalized pagerank estimation for large graphs

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Lofgren, Peter A.; Banerjee, Siddhartha; Goel, Ashish; Comandur, Seshadhri C.

We propose a new algorithm, FAST-PPR, for computing personalized PageRank: given start node s and target node t in a directed graph, and given a threshold δ, it computes the Personalized PageRank πs(t) from s to t, guaranteeing that the relative error is small as long πs(t) > δ. Existing algorithms for this problem have a running-time of Ω(1/δ in comparison, FAST-PPR has a provable average running-time guarantee of O(√d/δ) (where d is the average in-degree of the graph). This is a significant improvement, since δ is often O(1/n) (where n is the number of nodes) for applications. We also complement the algorithm with an Ω(1/√δ) lower bound for PageRank estimation, showing that the dependence on δ cannot be improved. We perform a detailed empirical study on numerous massive graphs, showing that FAST-PPR dramatically outperforms existing algorithms. For example, on the 2010 Twitter graph with 1.5 billion edges, for target nodes sampled by popularity, FAST-PPR has a 20 factor speedup over the state of the art. Furthermore, an enhanced version of FAST-PPR has a 160 factor speedup on the Twitter graph, and is at least 20 times faster on all our candidate graphs. © 2014 ACM.

More Details

A scalable generative graph model with community structure

SIAM Journal on Scientific Computing

Kolda, Tamara G.; Pinar, Ali; Plantenga, Todd P.; Comandur, Seshadhri C.

Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed block two-level Erd?os- Rényi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only O(dmax) storage, where dmax is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for modeling a real-worldWeb graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications.

More Details

An optimal lower bound for monotonicity testing over hypergrids

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Chakrabarty, Deeparnab; Comandur, Seshadhri C.

For positive integers n, d, consider the hypergrid [n]d with the coordinate-wise product partial ordering denoted by ≺. A function f: [n]d → ℕ is monotone if ∀x ≺ y, f(x) ≤ f(y). A function f is ε-far from monotone if at least an ε-fraction of values must be changed to make f monotone. Given a parameter ε, a monotonicity tester must distinguish with high probability a monotone function from one that is ε-far. We prove that any (adaptive, two-sided) monotonicity tester for functions f : [n]d → ℕ must make Ω(ε -1 d log n - ε-1 log ε-1) queries. Recent upper bounds show the existence of O(ε-1 d log n) query monotonicity testers for hypergrids. This closes the question of monotonicity testing for hypergrids over arbitrary ranges. The previous best lower bound for general hypergrids was a non-adaptive bound of Ω(d log n). © 2013 Springer-Verlag.

More Details

A space efficient streaming algorithm for triangle counting using the birthday paradox

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Jha, Madhav; Comandur, Seshadhri C.; Pinar, Ali

We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O( p n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-Time estimate of the transitivity/number of triangles of a graph, by storing a miniscule fraction of edges.

More Details

Are we there yet? When to stop a Markov chain while generating random graphs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Ray, Jaideep R.; Pinar, Ali; Comandur, Seshadhri C.

Markov chains are convenient means of generating realizations of networks with a given (joint or otherwise) degree distribution, since they simply require a procedure for rewiring edges. The major challenge is to find the right number of steps to run such a chain, so that we generate truly independent samples. Theoretical bounds for mixing times of these Markov chains are too large to be practically useful. Practitioners have no useful guide for choosing the length, and tend to pick numbers fairly arbitrarily. We give a principled mathematical argument showing that it suffices for the length to be proportional to the number of desired number of edges. We also prescribe a method for choosing this proportionality constant. We run a series of experiments showing that the distributions of common graph properties converge in this time, providing empirical evidence for our claims. © 2012 Springer-Verlag.

More Details
68 Results
68 Results