The Consistency of Common Neighbors for Link Prediction in Stochastic Blockmodels

Neural Information Processing Systems 

Link prediction and clustering are key problems for network-structured data. While spectral clustering has strong theoretical guarantees under the popular stochastic blockmodel formulation of networks, it can be expensive for large graphs. On the other hand, the heuristic of predicting links to nodes that share the most common neighbors with the query node is much fast, and works very well in practice. We show theoretically that the common neighbors heuristic can extract clusters with high probability when the graph is dense enough, and can do so even in sparser graphs with the addition of a "cleaning" step. Empirical results on simulated and real-world data support our conclusions.