Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

Zhang, Han, Cao, Yuan

arXiv.org Machine Learning 

In recent years, self-supervised learning has emerged as a promising machine learning paradigm, offering a way to learn meaningful representations from vast amounts of unlabeled data. Selfsupervised learning is of vital importance because the success of supervised learning is dependent on the accessibility of a large number of carefully labeled data, while the high-quality labeled data is expensive and time-consuming to obtain. Self-supervised learning leverages a large amount of unlabeled data to pre-train the representations for the following supervised fine-tuning learning task without requiring more labeled data. Major categories of self-supervised learning methods include contrastive learning (Oord et al., 2018; Chen et al., 2020; He et al., 2020) and generative self-supervised learning (Kingma and Welling, 2013; Goodfellow et al., 2014). Among the various self-supervised learning methods, SimCLR (Chen et al., 2020) algorithm has gained significant attention due to its simplicity and remarkable performance for vision tasks. SimCLR leverages the idea of contrastive learning, where representations are learned by maximizing agreement between differently augmented views of the same image while minimizing agreement between views of different images. Compared with purely supervised learning, this approach has demonstrated exceptional capabilities in capturing high-level semantic information and achieving state-of-the-art results on various downstream tasks. Department of Statistics and Actuarial Science, The University of Hong Kong; e-mail: hzhang23@connect.hku.hk