Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Open in new window