Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

Refinetti, Maria, Goldt, Sebastian, Krzakala, Florent, Zdeborová, Lenka

Feb-23-2021–arXiv.org Machine Learning

Explaining the success of deep neural networks in many areas of machine learning remains a key challenge for learning theory. A series of recent theoretical works made progress towards this goal by proving trainability of two-layer neural networks (2LNN) with gradient-based methods [1-6]. These results are based on the observation that strongly over-parameterised 2LNN can achieve good performance even if their first-layer weights remain almost constant throughout training. This is the case if the initial weights are chosen with a particular scaling, which was dubbed the "lazy regime" by Chizat et al. [7]. This behaviour is to be contrasted with the "feature learning regime", where the weights of the first layer move significantly during training. Going a step further, simply fixing the first-layer weights of a 2LNN at their initial values yields the well-known random features model of Rahimi & Recht [8, 9], and can be seen as an approximation of kernel learning [10]. Recent empirical studies showed that on some benchmark data sets in computer vision, kernels derived from neural networks achieve comparable performance to neural networks [11-16]. These results raise the question of whether neural networks only learn successfully if random features can also learn successfully, and have led to a renewed interest in the exact conditions under which neural networks trained with gradient descent achieve a better performance than random features [17-20].

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

Feb-23-2021

arXiv.org PDF

Add feedback

Country:
- Europe
  - France (0.14)
  - Italy (0.14)
  - Switzerland (0.14)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found