Cluster Exploration using Informative Manifold Projections
Gerolymatos, Stavros, Evangelopoulos, Xenophon, Gusev, Vladimir, Goulermas, John Y.
–arXiv.org Artificial Intelligence
Data exploration focuses on identifying informative patterns to discover new insight and knowledge about a collection of data. The often high-dimensional nature of such data renders the visual exploration process intractable for the human eye, and therefore specialized data manipulation of the original samples is essential in practice. Dimensionality reduction methods have been at the forefront of this challenge Bishop [2006] aiming to recover lower-dimensional embeddings of the original data that facilitate the identification of underlying data cohorts and help understand better the problem at hand. One of the most well known dimensionality reduction approaches perhaps is principal component analysis (PCA) Hotelling [1933], an efficient linear method aiming to maximizing the variance along the projection vectors, which in practice appears insufficient for meaningful separation of cohorts. A variety of non-linear methods have also been proposed that conversely focus on locally preserving the structure of the data such as Isomap Tenenbaum et al. [2000], LLE Roweis and Saul [2001], t-SNE van der Maaten and Hinton [2008], UMAP McInnes and Healy [2018], TriMap Amid and Warmuth [2019] and LargeVis Tang et al. [2016], etc. Projection pursuit (PP) Friedman and Tukey [1974], Caussinus and Ruiz-Gazen [2010] defines a family of dimensionality reduction methods that can enable various embedding effects depending on a suitably selected criterion. The kurtosis index Chiang et al. [2001] is one specific PP example that specializes in identifying "interesting" projections. Its minimization particularly penalizes the normality of the data distribution, promoting thus more meaningful separability when searching for clusters. The above approaches nevertheless share the same attribute of offering a single static projection that does not consider any prior knowledge a practitioner may have regarding the high-dimensional latent structure. Such projections can be uninformative as they tend to illustrate the most evident features which are often already known by the reader.
arXiv.org Artificial Intelligence
Sep-26-2023
- Country:
- Asia > India
- West Bengal > Kolkata (0.04)
- Europe
- Spain > Andalusia
- Granada Province > Granada (0.04)
- United Kingdom > England
- Merseyside > Liverpool (0.14)
- Spain > Andalusia
- North America > United States
- New York > New York County > New York City (0.04)
- Asia > India
- Genre:
- Research Report (0.64)
- Technology: