Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Open in new window