Probabilistic Multilevel Clustering via Composite Transportation Distance

Ho, Nhat, Huynh, Viet, Phung, Dinh, Jordan, Michael I.

Oct-28-2018–arXiv.org Machine Learning

Clustering is a classic and fundamental problem in machine learning. Popular clustering methods such as K-means and the EM algorithm have been the workhorses of exploratory data analysis. However, the underlying model for such methods is a simple flat partition or a mixture model, which do not capture multilevel structures (e.g., words are grouped into documents, documents are grouped into corpora) that arise in many applications in the physical, biological or cognitive sciences. The clustering of multilevel structured data calls for novel methodologies beyond classical clustering. One natural approach for capturing multilevel structures is to use a hierarchy in which data are clustered locally into groups, and those groups are partitioned in a "global clustering." Attempts to develop algorithms of this kind can be roughly classified into two categories. The first category makes use of probabilistic models, often based on Dirichlet process priors.

artificial intelligence, machine learning, transportation distance, (17 more...)

arXiv.org Machine Learning

Oct-28-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found