Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Jan-18-2025, 19:40:58 GMT–Neural Information Processing Systems

We provide quantitative bounds measuring the L 2 difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time.

deep network, kernel regime, spectral bias, (2 more...)

Neural Information Processing Systems

Jan-18-2025, 19:40:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)