kwikcluster
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (4 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (4 more...)
Parallel Correlation Clustering on Big Graphs
Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan
Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Correlation Clustering with Adaptive Similarity Queries
This paper studies correlation clustering in an active learning setting where the learner does not query for all the {n choose 2} pairs of vertices. In the standard setting, the algorithm KwikCluster of Charikar et al. achieves a clustering cost of 3OPT where OPT is the cost of the best clustering. Here the cost of the clustering is defined as the number of pairs labeled 1 and put in different clusters plus the number of pairs labeled -1 and put in the same cluster. The main contribution of the paper is a variant ACC of KwikCluster that achieves a cost of 3OPT O(n 3/Q) where Q is an upper bound on the number of queries made by the algorithm. The authors also prove a matching lower bound on the error of any randomized algorithm that makes Q queries.
Parallel Correlation Clustering on Big Graphs Xinghao Pan
Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds, and provably achieve nearly linear speedups. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3 approximation ratio. We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15 speedup.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
Query-Efficient Correlation Clustering with Noisy Oracle
Kuroki, Yuko, Miyauchi, Atsushi, Bonchi, Francesco, Chen, Wei
We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We propose two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (4 more...)