When Do Language Models Need Billion Words In Their Datasets