Resolving Discrepancies in Compute-Optimal Scaling of Language Models

Neural Information Processing Systems 

We explain the discrepancy by reproducing the Kaplan et al. scaling law on two datasets (OpenWebText2 and RefinedWeb)

Similar Docs  Excel Report  more

TitleSimilaritySource
None found