Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width

Oct-24-2023–arXiv.org Artificial Intelligence

We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability" regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction" phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.

diagram, initialization, sharpness, (14 more...)

arXiv.org Artificial Intelligence

Oct-24-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Maryland > Prince George's County
    - College Park (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Leisure & Entertainment (0.32)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found