Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Lu, Weihao, Lin, Qian, Xia, Yingcun, Huang, Dongming

arXiv.org Machine Learning 

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the largedimensional setting where the sample size and dimension are of comparable order, i.e., n dγ for some γ > 0. We first consider inner-product kernels on the sphere Sd 1 and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions s 0, where smeasures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever sis positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in Rd whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions. Keywords: Spectral algorithms, learning curves, high dimension, benign overfitting. 1 Introduction Nonparametric regression studies the estimation of an unknown function f: Rd R from ni.i.d.