Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering
Lesieur, Thibault, De Bacco, Caterina, Banks, Jess, Krzakala, Florent, Moore, Cris, Zdeborová, Lenka
Abstract-- We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of m points in n dimensions, n, m and α m/n stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of α and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, r 4 2 α, there is a gap between the threshold for informationtheoretically optimal performance and the threshold at which known algorithms succeed. Clustering m points in n-dimensional space is a ubiquitous problem in statistical inference and data science.
Oct-10-2016