ABeyond-Worst-Case Analysis of Greedy k-means + +
–Neural Information Processing Systems
Greedy k-means++ is a generalization of k-means++ where, in each iteration, a new seed is greedily chosen among multiple ℓ 2points sampled, as opposed to a single seed being sampled in k-means++. While empirical studies consistently show the superior performance of greedy k-means++, making it a preferred method in practice, a discrepancy exists between theory and practice. No theoretical justification currently explains this improved performance. Indeed, the prevailing theory suggests that greedy k-means++ exhibits worse performance than k-means++ in worst-case scenarios. This paper presents an analysis demonstrating the outperformance of the greedy algorithm compared to k-means++ for a natural class of well-separated instances with exponentially decaying distributions, such as Gaussian, specifically when ℓ = lnk +Θ(1), a common parameter setting in practical applications.
Neural Information Processing Systems
Jun-20-2026, 10:27:45 GMT