Resolving Discrepancies in Compute-Optimal Scaling of Language Models
–Neural Information Processing Systems
We explain the discrepancy by reproducing the Kaplan et al. scaling law on two datasets (OpenWebText2 and RefinedWeb)
Neural Information Processing Systems
Oct-10-2025, 14:12:10 GMT
- Country:
- Europe
- Germany (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East
- Jordan (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Europe
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Technology: