Metric Space Magnitude for Evaluating Unsupervised Representation Learning
Limbeck, Katharina, Andreeva, Rayna, Sarkar, Rik, Rieck, Bastian
Determining suitable low-dimensional representations of complex high-dimensional data is a challenging task in numerous applications. Whether its preprocessing biological datasets prior to their analysis (Nguyen & Holmes, 2019), the visualisation of complex structure in single-cell sequencing data (Lähnemann et al., 2020), or the comparison of different manifold representations (Barannikov et al., 2022): an understanding of structural (dis)similarities is crucial, especially in the context of datasets that are ever-increasing in size and dimensionality. The primary assumption driving such analyses is the manifold hypothesis, which assumes that data is a (noisy) subsample from some unknown manifold. Operating under this assumption, manifold learning methods have made large advances in detecting complex structures in data, but they typically use local measures of the embedding quality, which are ultimately relying on local approximations of manifolds by k-nearest neighbour graphs. However, such approximations--which require specific parameter choices and thresholds--can have a substantial negative impact on both embedding results and the interpretation of evaluation scores. Moreover, countering the increasing popularity of non-linear dimensionality reduction methods that claim to preserve local and global structures, recent work (Chari & Pachter, 2023) sheds some doubt on the assumption that'good' embeddings should also faithfully preserve distances, while raising questions of how to measure the inevitable distortions introduced by representation learning. Thus, there is a need for novel methods in representation learning, which efficiently summarise data across varying levels of similarity, eliminating the need to rely on fixed neighbourhood graphs. Motivated by these considerations, we adopt a more general perspective that does not rely on manifold approximations. To this end, we propose a novel embedding quality measure based on metric space magnitude, a recently-proposed mathematical invariant that encapsulates numerous important geometric characteristics of metric spaces.
Nov-27-2023
- Country:
- Europe (0.46)
- North America > United States (0.28)
- Genre:
- Research Report (0.84)
- Industry:
- Technology: