Escaping mediocrity: how two-layer networks learn hard single-index models with SGD
Arnaboldi, Luca, Krzakala, Florent, Loureiro, Bruno, Stephan, Ludovic
–arXiv.org Artificial Intelligence
This study explores the sample complexity for two-layer neural networks to learn a single-index target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d\log{d})$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.
arXiv.org Artificial Intelligence
May-29-2023
- Country:
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe
- France (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- North America > United States (0.14)
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.86)
- Technology: