A Generalization Theory for Zero-Shot Prediction

Jul-15-2025–arXiv.org Machine Learning

In 2021, OpenAI shocked the world by improving the zero-shot classification accuracy on ImageNet from 11.5% to 76.2% via the CLIP series of models (Radford et al., 2021). This event redefined the goal of zero-shot prediction from producing models that generalized to unseen classes to those that generalized to unseen tasks entirely. Two fundamental drivers of CLIP's success were 1) the use of natural language as a medium for representing arbitrary classes (as in the previous state-of-the-art Visual N-grams (Li et al., 2017)), and 2) a massive, yet carefully designed pre-training set which significantly impacted downstream performance Radford et al. (2021); Fang et al. (2023); Xu et al. (2024). Despite the remarkable success of these foundation model-based pipelines Bommasani et al. (2022), there are unique components of zero-shot prediction that warrant investigation from a theoretical point of view. To clarify these gaps, we contrast zero-shot prediction (ZSP) with the related setting of few-shot learning (FSL). Let x X denote an input (often an image) that accompanies a discrete value y Y (often a class label).

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Jul-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County > Seattle (0.04)
- Europe
  - Poland (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Workflow (0.45)
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found