Embedding Dimension of Contrastive Learning and k -Nearest Neighbors
–Neural Information Processing Systems
We study the embedding dimension of distance comparison data in two settings: contrastive learning and $k$-nearest neighbors ($k$-NN). In both cases, the goal is to find the smallest dimension $d$ of an $\ell_p$-space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used $\ell_2$-distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given $m$ labeled samples of the form $(x_i, y_i^+, z_i^-)$ representing the fact that the positive example $y_i$ is closer to the anchor $x_i$ than the negative example $z_i$. We show that for representing such dataset in:- $\ell_2$: $d = \Theta(\sqrt{m})$ is necessary and sufficient.-
Neural Information Processing Systems
Mar-20-2026, 06:40:59 GMT
- Technology: