Classifying high-dimensional Gaussian mixtures: Where kernel methods fail and neural networks succeed

Refinetti, Maria, Goldt, Sebastian, Krzakala, Florent, Zdeborová, Lenka

arXiv.org Machine Learning 

Explaining the success of deep neural networks in many areas of machine learning remains a key challenge for learning theory. A series of recent theoretical works made progress towards this goal by proving trainability of two-layer neural networks (2LNN) with gradient-based methods [1-6]. These results are based on the observation that strongly over-parameterised 2LNN can achieve good performance even if their first-layer weights remain almost constant throughout training. This is the case if the initial weights are chosen with a particular scaling, which was dubbed the "lazy regime" by Chizat et al. [7]. This behaviour is to be contrasted with the "feature learning regime", where the weights of the first layer move significantly during training. Going a step further, simply fixing the first-layer weights of a 2LNN at their initial values yields the well-known random features model of Rahimi & Recht [8, 9], and can be seen as an approximation of kernel learning [10]. Recent empirical studies showed that on some benchmark data sets in computer vision, kernels derived from neural networks achieve comparable performance to neural networks [11-16]. These results raise the question of whether neural networks only learn successfully if random features can also learn successfully, and have led to a renewed interest in the exact conditions under which neural networks trained with gradient descent achieve a better performance than random features [17-20].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found