Probabilistic Multilevel Clustering via Composite Transportation Distance
Ho, Nhat, Huynh, Viet, Phung, Dinh, Jordan, Michael I.
Clustering is a classic and fundamental problem in machine learning. Popular clustering methods such as K-means and the EM algorithm have been the workhorses of exploratory data analysis. However, the underlying model for such methods is a simple flat partition or a mixture model, which do not capture multilevel structures (e.g., words are grouped into documents, documents are grouped into corpora) that arise in many applications in the physical, biological or cognitive sciences. The clustering of multilevel structured data calls for novel methodologies beyond classical clustering. One natural approach for capturing multilevel structures is to use a hierarchy in which data are clustered locally into groups, and those groups are partitioned in a "global clustering." Attempts to develop algorithms of this kind can be roughly classified into two categories. The first category makes use of probabilistic models, often based on Dirichlet process priors.
Oct-28-2018
- Country:
- Asia
- Middle East > Jordan (0.04)
- Singapore (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Research Report (0.82)
- Technology: