Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers Siyu Chen Department of Statistics and Data Science, Yale University

Neural Information Processing Systems 

In particular, most existing work only theoretically explains how the attention mechanism facilitates ICL under certain data models.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found