interpolation
Spectral-Transport Stability and Benign Overfitting in Interpolating Learning
Fredriksson-Imanov, Gustav Olaf Yunus Laitinen-Lundström
We develop a theoretical framework for generalization in the interpolating regime of statistical learning. The central question is why highly overparameterized estimators can attain zero empirical risk while still achieving nontrivial predictive accuracy, and how to characterize the boundary between benign and destructive overfitting. We introduce a spectral-transport stability framework in which excess risk is controlled jointly by the spectral geometry of the data distribution, the sensitivity of the learning rule under single-sample replacement, and the alignment structure of label noise. This leads to a scale-dependent Fredriksson index that combines effective dimension, transport stability, and noise alignment into a single complexity parameter for interpolating estimators. We prove finite-sample risk bounds, establish a sharp benign-overfitting criterion through the vanishing of the index along admissible spectral scales, and derive explicit phase-transition rates under polynomial spectral decay. For a model-specific specialization, we obtain an explicit theorem for polynomial-spectrum linear interpolation, together with a proof of the resulting rate. The framework also clarifies implicit regularization by showing how optimization dynamics can select interpolating solutions of minimal spectral-transport energy. These results connect algorithmic stability, double descent, benign overfitting, operator-theoretic learning theory, and implicit bias within a unified structural account of modern interpolation.
- North America > United States > New York (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
Kriging via variably scaled kernels
Audone, Gianluca, Marchetti, Francesco, Perracchione, Emma, Rossini, Milvia
Classical Gaussian processes and Kriging models are commonly based on stationary kernels, whereby correlations between observations depend exclusively on the relative distance between scattered data. While this assumption ensures analytical tractability, it limits the ability of Gaussian processes to represent heterogeneous correlation structures. In this work, we investigate variably scaled kernels as an effective tool for constructing non-stationary Gaussian processes by explicitly modifying the correlation structure of the data. Through a scaling function, variably scaled kernels alter the correlations between data and enable the modeling of targets exhibiting abrupt changes or discontinuities. We analyse the resulting predictive uncertainty via the variably scaled kernel power function and clarify the relationship between variably scaled kernels-based constructions and classical non-stationary kernels. Numerical experiments demonstrate that variably scaled kernels-based Gaussian processes yield improved reconstruction accuracy and provide uncertainty estimates that reflect the underlying structure of the data
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (5 more...)
Natural Value Approximators: Learning when to Trust Past Estimates
Neural networks have a smooth initial inductive bias, such that small changes in input do not lead to large changes in output. However, in reinforcement learning domains with sparse rewards, value functions have non-smooth structure with a characteristic asymmetric discontinuity whenever rewards arrive. We propose a mechanism that learns an interpolation between a direct value estimate and a projected value estimate computed from the encountered reward and the previous estimate. This reduces the need to learn about discontinuities, and thus improves the value function approximation. Furthermore, as the interpolation is learned and state-dependent, our method can deal with heterogeneous observability. We demonstrate that this one change leads to significant improvements on multiple Atari games, when applied to the state-of-the-art A3C algorithm.
- North America > United States > New Jersey (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- North America > Canada (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Texas (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Sweden (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Education (0.68)
- Information Technology > Security & Privacy (0.46)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)