Toward Understanding In-context vs. In-weight Learning
Chan, Bryan, Chen, Xinyi, György, András, Schuurmans, Dale
–arXiv.org Artificial Intelligence
It has recently been demonstrated empirically that in-context learning emerges in transformers when certain distributional properties are present in the training data, but this ability can also diminish upon further training. We provide a new theoretical understanding of these phenomena by identifying simplified distributional properties that give rise to the emergence and eventual disappearance of in-context learning. We do so by first analyzing a simplified model that uses a gating mechanism to choose between an in-weight and an in-context predictor. Through a combination of a generalization error and regret analysis we identify conditions where in-context and in-weight learning emerge. These theoretical findings are then corroborated experimentally by comparing the behaviour of a full transformer on the simplified distributions to that of the stylized model, demonstrating aligned results. We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.
arXiv.org Artificial Intelligence
Oct-30-2024
- Country:
- South America > Argentina
- Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Slovakia > Bratislava
- Bratislava (0.04)
- Poland > Masovia Province
- Warsaw (0.05)
- Hungary > Budapest
- Budapest (0.04)
- Germany > Hesse
- Darmstadt Region > Frankfurt (0.04)
- United Kingdom > England
- Asia > Indonesia
- Bali (0.04)
- South America > Argentina
- Genre:
- Research Report (0.82)
- Technology: