Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Aragam, Bryon, Dan, Chen, Ravikumar, Pradeep, Xing, Eric P.

arXiv.org Machine Learning 

In data clustering, a central problem is to define a proper notion of a "cluster" or equivalently, a partition of the input space [29]. Given such a target partition, it becomes possible to evaluate clustering algorithms in a consistent manner. Modern approaches include mode clustering [20], density clustering [43, 46-48], stochastic blockmodels [2, 24, 32, 45], and hierarchical clustering [19, 30, 54]. The most classical approach to this problem, however, is arguably Gaussian model-based clustering, in which points are partitioned according to a generative Gaussian mixture model [8, 22]. When this model is appropriate, it provides a simple, well-defined partition by which clustering algorithms can be evaluated and compared. This model has been extended to various parametric and semiparametric models [12, 25, 57], however, the extension of this methodology to general nonparametric settings has remained elusive. This is largely due to the extreme nonidentifiability of nonparametric mixture models, a problem which is well-studied but for which existing results require strong structural assumptions [15, 34, 35, 53]. Thus, it is a significant open problem to generalize these assumptions to a more flexible class of nonparametric mixture models. Let us set the stage for this problem in some generality.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found