SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

Aug-31-2023–arXiv.org Machine Learning

Deep learning has emerged as the standard approach to exploiting massive high-dimensional datasets. At the core of its success lies its capability to learn effective features with fairly blackbox architectures without suffering from the curse of dimensionality. To explain this success, two structural properties of data are commonly conjectured: (i) a low-dimensional structure that SGD-trained neural networks are able to adapt to; (ii) a hierarchical structure that neural networks can leverage with SGD training. In particular, From a statistical viewpoint: A line of work [Bac17, SH20, KK16, BK19] has investigated the sample complexity of learning with deep neural networks, decoupled from computational considerations. By directly considering global solutions of empirical risk minimization (ERM) problems over arbitrarily large neural networks and sparsity inducing norms, they showed that deep neural networks can overcome the curse of dimensionality on classes of functions with low-dimensional and hierarchical structures. However, this approach does not provide efficient algorithms: instead, a number of works have shown computational hardness of ERM problems [BR88, KS09, DLSS14] and it is unclear how much this line of work can inform practical neural networks, which are trained using SGD and variants.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

Aug-31-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report > New Finding (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found