Learning Curves of Stochastic Gradient Descent in Kernel Regression

Zhang, Haihan, Lin, Weicheng, Liu, Yuanshi, Fang, Cong

May-29-2025–arXiv.org Machine Learning

Non-parametric least-squares regression within the RKHS framework represents a cornerstone of statistical learning theory. One mainstream method to solve the problem is kernel ridge regression (KRR) with optimality analysis [Caponnetto and De Vito, 2007, Smale and Zhou, 2007, Zhang et al., 2024b]. Recent years have witnessed a renaissance of interest in kernel methods driven by the neural tangent kernel (NTK) theory [Jacot et al., 2018, Arora et al., 2019], which states that sufficiently wide neural networks, under specific initialization, can be well approximated by a deterministic kernel model derived from the network architecture. Though deep learning often operates in regimes beyond the traditional statistical mindset, recent advances demonstrate that these generalization mysteries are not peculiar to neural networks and the phenomena are also present in kernel regression, particularly in the high-dimensional regime [Ghorbani et al., 2021, Liang and Rakhlin, 2020, Zhang et al., 2024c]. Substantial studies have been made in the related regimes for kernel ridge or ridgeless methods. For instance, Liang and Rakhlin [2020] demonstrates the existence of benign overfitting for ridgeless regression, a phenomenon where the model interpolates data yet still generalizes well.

artificial intelligence, machine learning, sgd, (16 more...)

arXiv.org Machine Learning

May-29-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found