Scaling Optimal LR Across Token Horizons

Open in new window