A Data Generation Perspective to the Mechanism of In-Context Learning

Mao, Haitao, Liu, Guangliang, Ma, Yao, Wang, Rongrong, Tang, Jiliang

Feb-3-2024–arXiv.org Artificial Intelligence

In-Context Learning (ICL) empowers Large Language Models (LLMs) with the capacity to learn in context, achieving downstream generalization without gradient updates but with a few in-context examples. Despite the encouraging empirical success, the underlying mechanism of ICL remains unclear, and existing research offers various viewpoints of understanding. These studies propose intuition-driven and ad-hoc technical solutions for interpreting ICL, illustrating an ambiguous road map. In this paper, we leverage a data generation perspective to reinterpret recent efforts and demonstrate the potential broader usage of popular technical solutions, approaching a systematic angle. For a conceptual definition, we rigorously adopt the terms of skill learning and skill recognition. The difference between them is skill learning can learn new data generation functions from in-context data. We also provide a comprehensive study on the merits and weaknesses of different solutions, and highlight the uniformity among them given the perspective of data generation, establishing a technical foundation for future research to incorporate the strengths of different lines of research.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Feb-3-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.48)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning > Gradient Descent (0.36)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)