Scaling Data-Constrained Language Models

Neural Information Processing Systems 

We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found