Interpretable Next-token Prediction via the Generalized Induction Head

Open in new window