Resolving Discrepancies in Compute-Optimal Scaling of Language Models

Open in new window