Probabilistic Multilevel Clustering via Composite Transportation Distance

Ho, Nhat, Huynh, Viet, Phung, Dinh, Jordan, Michael I.

arXiv.org Machine Learning 

Clustering is a classic and fundamental problem in machine learning. Popular clustering methods such as K-means and the EM algorithm have been the workhorses of exploratory data analysis. However, the underlying model for such methods is a simple flat partition or a mixture model, which do not capture multilevel structures (e.g., words are grouped into documents, documents are grouped into corpora) that arise in many applications in the physical, biological or cognitive sciences. The clustering of multilevel structured data calls for novel methodologies beyond classical clustering. One natural approach for capturing multilevel structures is to use a hierarchy in which data are clustered locally into groups, and those groups are partitioned in a "global clustering." Attempts to develop algorithms of this kind can be roughly classified into two categories. The first category makes use of probabilistic models, often based on Dirichlet process priors.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found