On Feature Learning in Neural Networks with Global Convergence Guarantees

Chen, Zhengdao, Vanden-Eijnden, Eric, Bruna, Joan

Apr-22-2022–arXiv.org Machine Learning

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

Apr-22-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas > Clay County (0.24)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found