Riemannian Metric Learning: Closer to You than You Imagine

Gruffaz, Samuel, Sassen, Josua

arXiv.org Machine Learning 

In recent decades, machine learning research has focused on developing vector-based representations for various types of data, including images, text, and time series [22]. Learning a meaningful representation space is a foundational task that accelerates research progress, as exemplified by the success of Large Language Models (LLMs) [182]. A complementary challenge is learning a distance function (defining a metric space) that encodes aspects of the data's internal structure. This task is known as distance metric learning, or simply metric learning [20]. Metric learning methods find applications in every field using algorithms relying on a distance such as the ubiquitous k-nearest neighbors classifier: Classification and clustering [195], recommendation systems [89], optimal transport [45], and dimension reduction [116, 186]. However, when using only a global distance, the set of available modeling tools to derive computational algorithms is limited and does not capture the intrinsic data structure. Hence, in this paper, we present a literature review of Riemannian metric learning, a generalization of metric learning that has recently demonstrated success across diverse applications, from causal inference [51, 59, 147] to generative modeling [100, 111, 170]. Unlike metric learning, Riemannian metric learning does not merely learn an embedding capturing distance information, but estimates a Riemannian metric characterizing distributions, curvature, and distances in the dataset, i.e. the Riemannian structure of the data.