Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking

Open in new window