SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

arXiv.org Machine Learning 

Deep learning has emerged as the standard approach to exploiting massive high-dimensional datasets. At the core of its success lies its capability to learn effective features with fairly blackbox architectures without suffering from the curse of dimensionality. To explain this success, two structural properties of data are commonly conjectured: (i) a low-dimensional structure that SGD-trained neural networks are able to adapt to; (ii) a hierarchical structure that neural networks can leverage with SGD training. In particular, From a statistical viewpoint: A line of work [Bac17, SH20, KK16, BK19] has investigated the sample complexity of learning with deep neural networks, decoupled from computational considerations. By directly considering global solutions of empirical risk minimization (ERM) problems over arbitrarily large neural networks and sparsity inducing norms, they showed that deep neural networks can overcome the curse of dimensionality on classes of functions with low-dimensional and hierarchical structures. However, this approach does not provide efficient algorithms: instead, a number of works have shown computational hardness of ERM problems [BR88, KS09, DLSS14] and it is unclear how much this line of work can inform practical neural networks, which are trained using SGD and variants.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found