Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens
–Neural Information Processing Systems
Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.
Neural Information Processing Systems
May-28-2025, 06:38:43 GMT
- Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.92)
- Research Report
- Technology: