Resolving Discrepancies in Compute-Optimal Scaling of Language Models
–Neural Information Processing Systems
We explain the discrepancy by reproducing the Kaplan et al. scaling law on two datasets (OpenWebText2 and RefinedWeb)
Neural Information Processing Systems
Feb-17-2026, 16:08:32 GMT
- Country:
- Asia > Middle East
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Jordan (0.04)
- Israel > Tel Aviv District
- Europe
- Germany (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology: