Dynamic Correlation Clustering in Sublinear Update Time

Cohen-Addad, Vincent, Lattanzi, Silvio, Maggiori, Andreas, Parotsidis, Nikos

Jun-13-2024–arXiv.org Artificial Intelligence

Clustering is a cornerstone of contemporary machine learning and data analysis. A successful clustering algorithm partitions data elements so that similar items reside within the same group, while dissimilar items are separated. Introduced in 2004 by Bansal, Blum and Chawla Bansal et al. ((2004)), the correlation clustering objective offers a natural approach to model this problem. Due to its concise and elegant formulation, this problem has drawn significant interest from researchers and practitioners, leading to applications across diverse domains. These include ensemble clustering identification ((Bonchi et al., 2013)), duplicate detection ((Arasu et al., 2009)), community mining ((Chen et al., 2012)), disambiguation tasks ((Kalashnikov et al., 2008)), automated labeling ((Agrawal et al., 2009; Chakrabarti et al., 2008)), and many more. In the correlation clustering problem we are given a graph where each edge has either a positive or negative label, and where a positive edge (u, v) indicates that u, v are similar elements (and a negative edge (u, v) indicates that u, v are dissimilar), the objective is to compute a partition of the graph that minimizes the number of negative edges within clusters plus positive edges between clusters. Since the problem is NP-hard, researchers have focused on designing approximation algorithms. The algorithm proposed by Cao et al. ((2024)) achieves an approximation ratio of 1.43 + ϵ, improving upon the previous 1.73 + ϵ and 1.994 + ϵ achieved by Cohen-Addad et al. ((2023, 2022b)). Prior to these developments, the best approximation guarantee of 2.06 was attained by the algorithm of Chawla et al. ((2015)).

artificial intelligence, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

Jun-13-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Maryland (0.14)

Genre:
- Research Report (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found