Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Feb-5-2026, 06:35:11 GMT–Neural Information Processing Systems

Recent works (e.g., (Li \& Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new \textquotedblleft intrinsic learning rate\textquotedblright\ parameter that is the product of the normal learning rate $\eta$ and weight decay factor $\lambda$. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR.

artificial intelligence, machine learning, traditional optimization analysis, (7 more...)

Neural Information Processing Systems

Feb-5-2026, 06:35:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)