Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers Department of Statistics and Data Science, Department of Statistics and Data Science, Yale University