Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes

Neural Information Processing Systems 

Early work on Siamese networks \citep{chopra2005learning,hadsell2006dimensionality} already showed that pair construction directly shapes learned representations. In modern contrastive frameworks, poor pair selection remains a primary failure mode: it either causes collapse, where all embeddings converge to a point, or wastes the representational capacity of the space \citep{chen2020simple,tian2020makes,khosla2020supervised}. Contemporary methods typically generate positives via semantic-preserving augmentations (crop, jitter, view transform), while negatives are drawn from other elements in the mini-batch under the assumption that different images are semantically dissimilar.