Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width

Dec-26-2025, 11:32:01 GMT–Neural Information Processing Systems

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time edge of stability regime.

deep neural network, early training dynamic, regime, (4 more...)

Neural Information Processing Systems

Dec-26-2025, 11:32:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.91)
  - Statistical Learning > Gradient Descent (0.60)