Clustering Stable Instances of Euclidean k-means.

Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex

Neural Information Processing Systems 

The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance).