Critical Data Size of Language Models from a Grokking Perspective