In-Context Learning with Representations: Contextual Generalization of Trained Transformers Tong Y ang CMU Y u Huang

Neural Information Processing Systems 

This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.