A Billion-Token-Scale Pre-training Corpus for Math Zengzhi Wang 1,3,4 Xuefeng Li

Neural Information Processing Systems 

High-quality, large-scale corpora are the cornerstone of building foundation models.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found