The Benefits of Balance: From Information Projections to Variance Reduction
Liu, Lang, Mehta, Ronak, Pal, Soumik, Harchaoui, Zaid
Deep neural networks have shown remarkable success at learning task-specific representations of data when provided supervision from massive amounts of labeled training examples. Recent trends, however, have shifted toward taskagnostic, universal representations that may be easily fine-tuned or even have zero-shot capabilities out-of-the-box. Supervised learning, stricto sensu, is too limited a framework for these billion-parameter, data-hungry models, and a question at the heart of modern machine learning is learning from unlabelled, partially labeled, or weakly labeled data. This need has paved the way for the current generation of self-supervised learning (SSL) approaches that circumvent the need for large amounts of strong labels. In SSL, a model is trained on a generic pseudo-task that can be performed on unlabelled data, such as relating the two modalities of an image-caption pair or two augmentations of the same image. Despite several modern foundation models such as DINO (Caron et al., 2021; Oquab et al., 2024) and CLIP (Radford et al., 2021) being trained in this fashion, many aspects of SSL remain baffling. In particular, the training process of self-supervised models often outgrows and "breaks the rules" of the standard empirical risk minimization (ERM) toolkit. ERM combines two well-understood techniques: minibatch sampling and gradient-based optimization using backpropagation. SSL, on the other hand, adds clever, less-understood techniques to the training pipeline.
Aug-27-2024
- Country:
- Europe
- Ireland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Washington > King County > Seattle (0.04)
- Europe
- Genre:
- Research Report (0.50)
- Technology: