Goto

Collaborating Authors

 Gradient Descent









The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing Systems

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can bewell approximated byalinear system. When normalized training data is uniformly distributed on ahypersphere, the eigenfunctions of this linear system are spherical harmonic functions.