Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Paul, Debolina, Chakraborty, Saptarshi, Li, Didong, Dunson, David

arXiv.org Machine Learning 

Clustering of data into groups of relatively similar observations is one of the canonical tasks in unsupervised learning. With an increasing focus in recent years on very richly parameterized models, there has been a corresponding emphasis in the literature on complex clustering algorithms. A popular theme has been on clustering on the latent variable level, while allowing estimation of both the clustering structure and a complex nonlinear mapping from the latent to observed data level. Such methods are appealing in being able to realistically generate data that are indistinguishable from the observed data, while clustering observations in a lower-dimensional space. A particularly popular strategy is to develop clustering algorithms based on variational autoencoders (VAEs). For example, instead of drawing the latent variables in a VAE from standard Gaussian distributions, one can use a mixture of Gaussians for model-based clustering (Dilokthanakul et al., 2016; Lim et al., 2020; Yang et al., 2019). The problem with this family of methods is that, with a rich enough deep neural network, VAEs can accurately approximate any data generating distribution regardless of the continuous density placed on the latent variables. If one uses a richer family of densities, such as a mixture model, then one can potentially approximate the data distribution using a simpler neural network structure. However, the inferred clusters are not reliable due to problems of non-identifiability.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found