How Transformers Learn In-Context Recall Tasks? Optimality, Training Dynamics and Generalization

Open in new window