correlation clustering
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.06)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.07)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (2 more...)
Correlation Clustering with Adaptive Similarity Queries
In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries. On the one hand, we describe simple active learning algorithms, which provably achieve an almost optimal trade-off while giving cluster recovery guarantees, and we test them on different datasets. On the other hand, we prove information-theoretical bounds on the number of queries necessary to guarantee a prescribed disagreement bound. These results give a rich characterization of the trade-off between queries and clustering error.
Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications
Chakraborty, Diptarka, Chatterjee, Kushagra, Das, Debarati, Nguyen, Tien-Long
Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.07)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (2 more...)
- Asia > Afghanistan > Parwan Province > Charikar (0.06)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
Parallel Correlation Clustering on Big Graphs
Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan
Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
Cold-Start Active Correlation Clustering
Aronsson, Linus, Wu, Han, Chehreghani, Morteza Haghir
We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.