Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Veiga, Rodrigo, Stephan, Ludovic, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka

Feb-1-2022–arXiv.org Machine Learning

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

neural network, simulation, two-layer neural network, (15 more...)

arXiv.org Machine Learning

Feb-1-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - São Paulo (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks (1.00)