The Benefits of Balance: From Information Projections to Variance Reduction

Liu, Lang, Mehta, Ronak, Pal, Soumik, Harchaoui, Zaid

Aug-27-2024–arXiv.org Machine Learning

Deep neural networks have shown remarkable success at learning task-specific representations of data when provided supervision from massive amounts of labeled training examples. Recent trends, however, have shifted toward taskagnostic, universal representations that may be easily fine-tuned or even have zero-shot capabilities out-of-the-box. Supervised learning, stricto sensu, is too limited a framework for these billion-parameter, data-hungry models, and a question at the heart of modern machine learning is learning from unlabelled, partially labeled, or weakly labeled data. This need has paved the way for the current generation of self-supervised learning (SSL) approaches that circumvent the need for large amounts of strong labels. In SSL, a model is trained on a generic pseudo-task that can be performed on unlabelled data, such as relating the two modalities of an image-caption pair or two augmentations of the same image. Despite several modern foundation models such as DINO (Caron et al., 2021; Oquab et al., 2024) and CLIP (Radford et al., 2021) being trained in this fashion, many aspects of SSL remain baffling. In particular, the training process of self-supervised models often outgrows and "breaks the rules" of the standard empirical risk minimization (ERM) toolkit. ERM combines two well-understood techniques: minibatch sampling and gradient-based optimization using backpropagation. SSL, on the other hand, adds clever, less-understood techniques to the training pipeline.

iteration, log 2, prop, (16 more...)

arXiv.org Machine Learning

Aug-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Seattle (0.04)
- Europe
  - Ireland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (1.00)
  - Neural Networks > Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found