Deriving Neural Scaling Laws from the statistics of natural language

Open in new window