Goto

Collaborating Authors

 gtsne


Visualizing Data using GTSNE

Shi, Songting

arXiv.org Machine Learning

High-dimensional data visualization is a very important problem for human to sense the data. Currently, the state of art methods are t-SNE (Laurens et al. (2008), Laurens van der Maaten (2013)) and UMAP (Mcinnes and Healy (2018)), which has similar principle for the nonlinear low dimension reduction. They use neighborhood probability distribution to connect the high-dimensional data points to low-dimensional map points, which try to make the local relative neighborhood relation unchanged but ignoring the change in the macro structure of the data. However, this may make the low dimension map points representing the high-dimensional structure unfaithfully. In the low-dimensional neighborhood keeping and patching process, t-SNE sometimes will make the neighborhood relations in the highdimensional structure break in the the low-dimensional space. We add a macro loss term on the loss of t-SNE to make it keep the relative k-means centroids structure in the low and high dimensional space, which basically keep the macro structure unchanged in the low dimensional space.