A Billion-Token-Scale Pre-training Corpus for Math Zengzhi Wang 1,3,4 Xuefeng Li