Goto

Collaborating Authors

 constant factor


ABeyond-Worst-Case Analysis of Greedy k-means + +

Neural Information Processing Systems

Greedy k-means++ is a generalization of k-means++ where, in each iteration, a new seed is greedily chosen among multiple โ„“ 2points sampled, as opposed to a single seed being sampled in k-means++. While empirical studies consistently show the superior performance of greedy k-means++, making it a preferred method in practice, a discrepancy exists between theory and practice. No theoretical justification currently explains this improved performance. Indeed, the prevailing theory suggests that greedy k-means++ exhibits worse performance than k-means++ in worst-case scenarios. This paper presents an analysis demonstrating the outperformance of the greedy algorithm compared to k-means++ for a natural class of well-separated instances with exponentially decaying distributions, such as Gaussian, specifically when โ„“ = lnk +ฮ˜(1), a common parameter setting in practical applications.