Data Distributional Properties Drive Emergent In-Context Learning in Transformers

May-27-2025, 10:33:20 GMT–Neural Information Processing Systems

Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having a large number of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed.

learning, property drive emergent in-context learning, transformer, (3 more...)

Neural Information Processing Systems

May-27-2025, 10:33:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.57)
  - Machine Learning > Neural Networks
    - Deep Learning (0.40)