Breaking the Small Cluster Barrier of Graph Clustering

Feb-20-2013–arXiv.org Machine Learning

This paper considers a classic problem in machine learning and theoretical computer science, namely graph clustering, i.e., given an undirected unweighted graph, partition the nodes into disjoint clusters, so that the density of edges within one cluster is higher than those across clusters. Graph clustering arises naturally in many application across science and engineering. Some prominent examples include community detection in social network Mishra et al. [2007], submarket identification in E-commerce and sponsored search Yahoo!-Inc [2009], and co-authorship analysis in analyzing document database Ester et al. [1995], among others. From a purely binary classification theoretical point of view, the edges of the graph are (noisy) labels of similarity or affinity between pairs of objects, and the concept class consists of clusterings of the objects (encoded graphically by identifying clusters with cliques). Many theoretical results in graph clustering [e.g., Boppana, 1987, Chen et al., 2012, McSherry, 2001] consider the planted partition model, in which the edges are generated randomly; see Section 1.1 for more details. While numerous different methods have been proposed, their performance guarantees all share the following manner - under certain condition of the density of edges (within clusters and across clusters), the proposed method succeeds to recover the correct clusters exactly if all clusters are larger than a threshold size, typically Ω( n).

artificial intelligence, machine learning, probability, (19 more...)

arXiv.org Machine Learning

Feb-20-2013

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Services (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found