Interpretable Dimensionality Reduction by Feature Preserving Manifold Approximation and Projection

Yang, Yang, Sun, Hongjian, Gong, Jialei, Du, Yali, Yu, Di

arXiv.org Artificial Intelligence 

Nonlinear dimensionality reduction methods are ubiquitously applied for visualization and preprocessing highdimensional data in machine learning [1, 2, 3, 4, 5, 6, 7, 8]. These methods assume that the intrinsic dimension of the underlying manifold is much lower than the ambient dimension of the real-world data [9, 10, 11]. Based on approximating the manifold by k nearest neighbour (kNN) graph, nonlinear dimensionality reduction projects data from high to low-dimensional space and retains the topological structure of original data. While nonlinear dimensionality reduction is effective for visualizing high-dimensional data, one major weakness is lacking interpretability of the reduced-dimension results [8]. The reduced dimensions of nonlinear dimensionality reduction have no specific meaning, compared with linear methods like Principal Component Analysis (PCA) where the dimensions of the embedding space represent the directions of the largest variance of original data. Particularly, nonlinear dimensionality reduction focuses on preserving distance between observations and thereby loses source feature information in the embedding space, resulting in failing to illustrate feature loadings that linear methods such as PCA can provide to explain the feature contribution in each dimension. In this paper, we seek to improve the interpretability of nonlinear dimensionality reduction. In addition to preserving the local topological structure between observations in the embedding space, we aim to incorporate the source features to devise an interpretable nonlinear dimensionality reduction method. The feature information is encoded in the column space of data, and we use the tangent space to locally depict the column space [12, 13].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found