A Random Matrix Theory of Masked Self-Supervised Regression
Zurich, Arie Wortsman, Gerace, Federica, Loureiro, Bruno, Lu, Yue M.
Self-supervised learning (SSL) -- a training paradigm in which models learn useful representations from unlabeled data by exploiting the data itself as a source of supervision -- has emerged as a foundational component of the recent success of transformer architectures. By avoiding the need for manual annotations, SSL retains many of the benefits traditionally associated with supervised learning while avoiding reliance on labeled data. Consequently, SSL is widely adopted as a pretraining paradigm for learning general-purpose representations that substantially accelerate the optimization of downstream tasks, especially in data-scarce settings. A canonical example of a self-supervised learning task is masked language modeling (MLM), in which a neural network is trained to predict masked tokens in text using the remaining tokens as contextual information (Devlin et al., 2019a; Howard and Ruder, 2018; Radford et al., 2018; Brown et al., 2020; OpenAI, 2024). For example, given the sentence "The capital of France is Paris", a typical MLM task would be to teach the model to infer that we are speaking about the capital of a country from the context "France" and "Paris" from the masked sentence "The [MASK] of France is Paris".
Feb-2-2026
- Country:
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > Russia (0.04)
- Europe
- France (0.74)
- Italy
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States (0.14)
- Africa > Middle East
- Genre:
- Research Report (0.64)
- Industry:
- Government > Regional Government (0.46)
- Technology: