Publications

Publications / SAND Report

Accurate Characterization of Real Networks from Inaccurate Measurements

Pinar, Ali P.

Our nation's dependence on information networks makes it vital to anticipate disruptions or find weaknesses in these networks. But networks like the Internet are vast, distributed, and there is no mechanism to completely collect their structure. We are restricted to specific data collection schemes (like traceroute samples from router interfaces) that examine tiny portions of such a network. It has been empirically documented and theoretically proven that these measurements have significant biases, and direct inferences from them will be wrong. But these data collection mechanisms have limited flexibility and cannot be easily modified. Moreover, in many applications there are limits on how much data can be collected. How do we make accurate inferences of network properties with biased and limited measurements? The general problem this report deals with is how to work with incompletely observed networks. We will present several different approaches to this problem. First we will present an approach to estimate the degree distribution of a graph by sampling only a small portion of the vertices. This algorithm provides provably accurate results with sublinear samples. An alternative approach would be to try to enhance the information in the by selective collecting new information by probing for neighbors of a vertex or presence of individual edges. A different setting for working with incomplete arises when we have full access to local information, but do not have any global version of the graph. Can we still identify critical nodes in such a graph? We present an approach to identify such nodes efficiently. Finally, how can we put these ideas together to identify the structure of a network? We present an approach that can complement the existing approaches for network mapping. We start with an estimate of network structure based on existing network mapping methods. Then we find a critical router in the network, use the traffic through this network to selectively collect new data to enhance our prediction.