Embedding Dimension of Contrastive Learning and k -Nearest Neighbors
–Neural Information Processing Systems
We study the embedding dimension of distance comparison data in two settings: contrastive learning and k -nearest neighbors ( k -NN). In both cases, the goal is to find the smallest dimension d of an \ell_p -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used \ell_2 -distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given m labeled samples of the form (x_i, y_i, z_i -) representing the fact that the positive example y_i is closer to the anchor x_i than the negative example z_i .
Neural Information Processing Systems
Mar-16-2025, 15:49:21 GMT