Publications

7 Results
Skip to search filters

Property testing on product distributions: Optimal testers for bounded derivative properties

Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Chakrabarty, Deeparnab; Dixit, Kashyap; Jha, Madhav; Comandur, Seshadhri C.

The primary problem in property testing is to decide whether a given function satisfies a certain property, or is far from any function satisfying it. This crucially requires a notion of distance between functions. The most prevalent notion is the Hamming distance over the uniform distribution on the domain. This restriction to uniformity is rather limiting, and it is important to investigate distances induced by more general distributions. In this paper, we give simple and optimal testers for bounded derivative properties over arbitrary product distributions. Bounded derivative properties include fundamental properties such as monotonicity and Lipschitz continuity. Our results subsume almost all known results (upper and lower bounds) on monotonicity and Lipschitz testing. We prove an intimate connection between bounded derivative property testing and binary search trees (BSTs). We exhibit a tester whose query complexity is the sum of expected depths of optimal BSTs for each marginal. Furthermore, we show this sum-of-depths is also a lower bound. A technical contribution of our work is an optimal dimension reduction theorem for all bounded derivative properties, which relates the distance of a function from the property to the distance of restrictions of the function to random lines. Such a theorem has been elusive even for monotonicity, and our theorem is an exponential improvement to the previous best known result.

More Details

A space efficient streaming algorithm for triangle counting using the birthday paradox

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Jha, Madhav; Comandur, Seshadhri C.; Pinar, Ali

We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires O( p n) space (n is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-Time estimate of the transitivity/number of triangles of a graph, by storing a miniscule fraction of edges.

More Details
7 Results
7 Results