Tight PAC-Bayesian Risk Certificates for Contrastive Learning

Van Elst, Anna, Ghoshdastidar, Debarghya

arXiv.org Machine Learning 

A key driving force behind the rapid advances in foundation models is the availability and exploitation of massive amounts of unlabeled data. Broadly, one learns meaningful representations from unlabeled data, reducing the demand for labeled samples when training (downstream) predictive models. In recent years, there has been a strong focus on self-supervised approaches to representation learning, which learn neural network-based embedding maps from carefully constructed augmentations of unlabeled data, such as image cropping, rotations, color distortion, Gaussian blur, etc. [3, 7, 9, 15]. Contrastive representation learning is a popular form of self-supervised learning where one aims to learn a mapping of the data to a Euclidean space such that semantically similar data, obtained via augmentations, are embedded closer than independent samples [45, 19, 24]. This technique gained widespread attention with the introduction of SimCLR, an abbreviation for simple framework for contrastive learning of representations [7]. The SimCLR framework employs a carefully designed contrastive loss to maximize the similarity between the representations of augmented views of the same sample while minimizing the similarity between the representations from different samples [39, 7]. Although SimCLR remains one of the most practically used contrastive models, theoretical analysis of SimCLR's performance and generalization abilities is still limited [4, 32]. The study of generalization error in self-supervised models is mostly based on two distinct frameworks [2, 17], both introduced in the context of contrastive learning. The contrastive unsupervised representation learning (CURL) framework, introduced by Arora et al. [2], assumes access to tuples z

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found