On the Optimizer Dependence of Neural Scaling Laws

Open in new window