Identifiability of Nonparametric Mixture Models and Bayes Optimal Clustering

Aragam, Bryon, Dan, Chen, Ravikumar, Pradeep, Xing, Eric P.

Feb-12-2018–arXiv.org Machine Learning

In data clustering, a central problem is to define a proper notion of a "cluster" or equivalently, a partition of the input space [29]. Given such a target partition, it becomes possible to evaluate clustering algorithms in a consistent manner. Modern approaches include mode clustering [20], density clustering [43, 46-48], stochastic blockmodels [2, 24, 32, 45], and hierarchical clustering [19, 30, 54]. The most classical approach to this problem, however, is arguably Gaussian model-based clustering, in which points are partitioned according to a generative Gaussian mixture model [8, 22]. When this model is appropriate, it provides a simple, well-defined partition by which clustering algorithms can be evaluated and compared. This model has been extended to various parametric and semiparametric models [12, 25, 57], however, the extension of this methodology to general nonparametric settings has remained elusive. This is largely due to the extreme nonidentifiability of nonparametric mixture models, a problem which is well-studied but for which existing results require strong structural assumptions [15, 34, 35, 53]. Thus, it is a significant open problem to generalize these assumptions to a more flexible class of nonparametric mixture models. Let us set the stage for this problem in some generality.

artificial intelligence, machine learning, mixture model, (18 more...)

arXiv.org Machine Learning

Feb-12-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found