Consistency and Inconsistency in $K$-Means Clustering
Blanchard, Moïse, Jaffe, Adam Quinn, Zhivotovskiy, Nikita
A celebrated result of Pollard proves asymptotic consistency for $k$-means clustering when the population distribution has finite variance. In this work, we point out that the population-level $k$-means clustering problem is, in fact, well-posed under the weaker assumption of a finite expectation, and we investigate whether some form of asymptotic consistency holds in this setting. As we illustrate in a variety of negative results, the complete story is quite subtle; for example, the empirical $k$-means cluster centers may fail to converge even if there exists a unique set of population $k$-means cluster centers. A detailed analysis of our negative results reveals that inconsistency arises because of an extreme form of cluster imbalance, whereby the presence of outlying samples leads to some empirical $k$-means clusters possessing very few points. We then give a collection of positive results which show that some forms of asymptotic consistency, under only the assumption of finite expectation, may be recovered by imposing some a priori degree of balance among the empirical $k$-means clusters.
Jul-9-2025
- Country:
- Europe > Estonia
- Tartu County > Tartu (0.04)
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- New York > New York County
- New York City (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- California > Alameda County
- Europe > Estonia
- Genre:
- Research Report (0.82)
- Technology: