Understanding and Enhancing Mask-Based Pretraining towards Universal Representations

Jun-22-2026, 20:45:56 GMT–Neural Information Processing Systems

Mask-based pretraining has become a cornerstone of modern large-scale models across language, vision, and recently biology. Despite its empirical success, its role and limits in learning data representations have been unclear. In this work, we show that the behavior of mask-based pretraining can be directly characterized by test risk in high-dimensional minimum-norm ("ridge-less") linear regression, without relying on further model specifications.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-22-2026, 20:45:56 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.93)
  - Vision (0.93)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.66)