Pre-Training to Learn in Context

Gu, Yuxian, Dong, Li, Wei, Furu, Huang, Minlie

May-15-2023–arXiv.org Artificial Intelligence

In-context learning, where pre-trained language models learn to perform tasks from task examples and instructions in their contexts, has attracted much attention in the NLP community. However, the ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. To this end, we propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability by pre-training the model on a large collection of "intrinsic tasks" in the general plain-text corpus using the simple language modeling objective. PICL encourages the model to infer and perform tasks by conditioning on the contexts while maintaining task generalization of pre-trained models. We evaluate the in-context learning performance of the model trained with PICL on seven widely-used text classification datasets and the Super-NaturalInstrctions benchmark, which contains 100+ NLP tasks formulated to text generation. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters. The code is publicly available at https://github.com/thu-coai/PICL.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

May-15-2023

arXiv.org PDF

Add feedback

Country:
- South America > Argentina (0.04)
- Europe > Ukraine (0.04)
- North America
  - Mexico (0.04)
  - Canada (0.04)
  - United States
    - Wisconsin (0.04)
    - Idaho (0.04)
- Asia
  - Middle East > Qatar (0.04)
  - Japan (0.04)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education (0.68)
- Government (0.67)
- Energy (0.67)
- Leisure & Entertainment > Sports
  - Soccer (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.89)
    - Text Classification (0.89)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found