Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Li, Yicheng, Gan, Weiye, Shi, Zuoqiang, Lin, Qian

arXiv.org Artificial Intelligence 

The neural tangent kernel (NTK) theory (Jacot et al., 2018), which shows that the gradient kernel regression approximates the over-parametrized neural network trained by gradient descent well (Jacot et al., 2018; Allen-Zhu et al., 2019; Lee et al., 2019), brings us a natural surrogate to understand the generalization behavior of the neural networks in certain circumstances. This surrogate has led to recent renaissance of the study of kernel methods. For example, one would ask whether overfitting could harm the generalization (Bartlett et al., 2020), how the smoothness of the underlying regression function would affect the generalization error (Li et al., 2023), or if one can determine the lower bound of the generalization error at a specific function? All these problems can be answered by the generalization error curve which aims at determining the exact generalization error of a certain kernel regression method with respect to the kernel, the regression function, the noise level and the choice of the regularization parameter. It is clear that such a generalization error curve would provide a comprehensive picture of the generalization ability of the corresponding kernel regression method (Bordelon et al., 2020; Cui et al., 2021; Li et al., 2023).