constant factor
ABeyond-Worst-Case Analysis of Greedy k-means + +
Greedy k-means++ is a generalization of k-means++ where, in each iteration, a new seed is greedily chosen among multiple โ 2points sampled, as opposed to a single seed being sampled in k-means++. While empirical studies consistently show the superior performance of greedy k-means++, making it a preferred method in practice, a discrepancy exists between theory and practice. No theoretical justification currently explains this improved performance. Indeed, the prevailing theory suggests that greedy k-means++ exhibits worse performance than k-means++ in worst-case scenarios. This paper presents an analysis demonstrating the outperformance of the greedy algorithm compared to k-means++ for a natural class of well-separated instances with exponentially decaying distributions, such as Gaussian, specifically when โ = lnk +ฮ(1), a common parameter setting in practical applications.