A Data Generation Perspective to the Mechanism of In-Context Learning
Mao, Haitao, Liu, Guangliang, Ma, Yao, Wang, Rongrong, Tang, Jiliang
–arXiv.org Artificial Intelligence
In-Context Learning (ICL) empowers Large Language Models (LLMs) with the capacity to learn in context, achieving downstream generalization without gradient updates but with a few in-context examples. Despite the encouraging empirical success, the underlying mechanism of ICL remains unclear, and existing research offers various viewpoints of understanding. These studies propose intuition-driven and ad-hoc technical solutions for interpreting ICL, illustrating an ambiguous road map. In this paper, we leverage a data generation perspective to reinterpret recent efforts and demonstrate the potential broader usage of popular technical solutions, approaching a systematic angle. For a conceptual definition, we rigorously adopt the terms of skill learning and skill recognition. The difference between them is skill learning can learn new data generation functions from in-context data. We also provide a comprehensive study on the merits and weaknesses of different solutions, and highlight the uniformity among them given the perspective of data generation, establishing a technical foundation for future research to incorporate the strengths of different lines of research.
arXiv.org Artificial Intelligence
Feb-3-2024
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Michigan (0.04)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: