Scaling Law with Learning Rate Annealing
–Neural Information Processing Systems
We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps: L(s) = L0 +A S α1 C S2, where L(s)is the validation loss at step s, S1 is the area under the LR curve, S2 is the LR annealing area, and L0, A, C, αare constant parameters.
Neural Information Processing Systems
Jun-19-2026, 04:46:34 GMT
- Country:
- North America > United States (0.67)
- Genre:
- Workflow (0.69)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Technology: