Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

May-28-2025, 06:38:43 GMT–Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-28-2025, 06:38:43 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report
  - Experimental Study (0.92)
  - New Finding (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.45)
    - Statistical Learning > Gradient Descent (0.31)
  - Natural Language > Large Language Model (0.88)
  - Representation & Reasoning (1.00)