Critical Data Size of Language Models from a Grokking Perspective

Open in new window