minima
Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks
Huang, Jie, Loureiro, Bruno, Mannelli, Stefano Sarao
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > France (0.04)
- Asia > Singapore (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > Macao (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials
Mulayoff, Rotem, Stich, Sebastian U.
The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.05)
- Europe > France (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)