MATHPILE: ABillion-Token-ScalePre-training CorpusforMath
–Neural Information Processing Systems
High-quality, diverse pre-training corpora form the cornerstone for developing powerful foundation models, enabling AI assistants like ChatGPT [47] to exhibit balanced competencies across a broad spectrum of tasks [11].
Neural Information Processing Systems
Feb-10-2026, 08:44:02 GMT
- Country:
- Asia
- Europe
- Austria (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- North America
- Canada
- Dominican Republic (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- New York > New York County
- New York City (0.04)
- Louisiana > Orleans Parish
- South America
- Brazil (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Genre:
- Instructional Material (0.67)
- Research Report (0.67)
- Industry:
- Law (0.67)
- Technology: