Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
Cui, Hugo, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka
Kernel methods are among the most popular models in machine learning. Despite their relative simplicity, they define a powerful framework in which non-linear features can be exploited without leaving the realm of convex optimisation. Kernel methods in machine learning have a long and rich literature dating back to the 60s [1, 2], but have recently made it back to the spotlight as a proxy for studying neural networks in different regimes, e.g. the infinite width limit [3-6] and the lazy regime of training [7]. Despite being defined in terms of a non-parametric optimisation problem, kernel methods can be mathematically understood as a standard parametric linear problem in a (possibly infinite) Hilbert space spanned by the kernel eigenvectors (a.k.a features). This dual picture fully characterizes the asymptotic performance of kernels in terms of a trade-off between two key quantities: the relative decay of the eigenvalues of the kernel (a.k.a.
May-31-2021
- Country:
- North America > United States
- New York > New York County
- New York City (0.14)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- Europe
- Switzerland (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Asia > Japan
- Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: