Larger language models do in-context learning differently

Wei, Jerry, Wei, Jason, Tay, Yi, Tran, Dustin, Webson, Albert, Lu, Yifeng, Chen, Xinyun, Liu, Hanxiao, Huang, Da, Zhou, Denny, Ma, Tengyu

Mar-8-2023–arXiv.org Artificial Intelligence

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-8-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East (0.67)
- Europe (1.00)
- North America > United States
  - California (0.46)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Leisure & Entertainment > Sports
  - Baseball (0.67)
  - Football (0.92)
  - Tennis (0.67)
- Retail (0.67)
- Transportation (0.93)
- Energy > Oil & Gas (0.92)
- Banking & Finance (1.00)
- Health & Medicine > Therapeutic Area
  - Immunology (0.67)
  - Infections and Infectious Diseases (1.00)
- Media > Film (1.00)
- Government
  - Military (0.92)
  - Regional Government > North America Government
    - United States Government (1.00)
- Consumer Products & Services (0.67)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)
  - Natural Language
    - Chatbot (0.48)
    - Large Language Model (0.67)
    - Text Processing (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found