Embedding Dimension of Contrastive Learning and k -Nearest Neighbors

Mar-20-2026, 06:40:59 GMT–Neural Information Processing Systems

We study the embedding dimension of distance comparison data in two settings: contrastive learning and $k$-nearest neighbors ($k$-NN). In both cases, the goal is to find the smallest dimension $d$ of an $\ell_p$-space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used $\ell_2$-distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given $m$ labeled samples of the form $(x_i, y_i^+, z_i^-)$ representing the fact that the positive example $y_i$ is closer to the anchor $x_i$ than the negative example $z_i$. We show that for representing such dataset in:- $\ell_2$: $d = \Theta(\sqrt{m})$ is necessary and sufficient.-

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Mar-20-2026, 06:40:59 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)