EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Yau, Chung-Yiu, Wai, Hoi-To, Raman, Parameswaran, Sarkar, Soumajyoti, Hong, Mingyi
–arXiv.org Artificial Intelligence
Contrastive representation learning has been instrumental in self-supervised learning for large-scale pretraining of foundation models Radford et al. (2021); Cherti et al. (2023) as well as in the fine-tuning stage on downstream tasks Xiong et al. (2020); Lindgren et al. (2021). It helps encode real-world data into lowdimensional feature vectors that abstract the important attributes about the data, and generalize well outside of the training distribution. More recently, contrastive learning with multi-modal data has helped embed different data modalities into the same feature space Li et al. (2023), such as the studies with visual-language models Radford et al. (2021); Alayrac et al. (2022); Cherti et al. (2023) and document understanding Xu et al. (2020); Lee et al. (2023). Contrastive learning uses pairwise comparison of representations in the training objective, with the goal of learning representations of data where positive pairs are drawn closer while negative pairs move apart in the representation space. It is well known that generating a large dataset of pairwise samples such as image-text pairs of the same semantics costs much lower than manual labeling, e.g., the WebImageText dataset used for training CLIP originates from Wikipedia articles Radford et al. (2021).
arXiv.org Artificial Intelligence
Apr-16-2024
- Country:
- Asia > China (0.14)
- North America > United States (0.14)
- Genre:
- Research Report (0.64)
- Technology: