How Transformers Implement Induction Heads: Approximation and Optimization Analysis

Open in new window