From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

Open in new window