Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

Open in new window