Dimension-free bounds in high-dimensional linear regression via error-in-operator approach
Noskov, Fedor, Puchkin, Nikita, Spokoiny, Vladimir
In contrast to the standard intuition that a learner should search for a trade-off between approximation and estimation errors, researchers empirically observed that large interpolating rules may still have small test error. Moreover, when the number of parameters exceeds sample size the prediction risk of neural networks passes a U-shaped curve and decreases again [Zhang et al., 2017, Nakkiran et al., 2020]. A bit later it became clear that benign overfitting and double descent are not distinctive features of deep learning. Similar phenomena are ubiquitous for overparametrized models such as random forests and random feature models [Belkin et al., 2019, Mei and Montanari, 2022], kernel methods [Belkin et al., 2018, Liang and Rakhlin, 2020], and linear regression [Bartlett et al., 2020, Hastie et al., 2022] to name a few. In [Belkin et al., 2018], the authors reasonably suggested that we must study more tractable "shallow" methods better before diving into deep learning theory. In the present paper, we consider a classical linear regression problem, where a learner aims to estimate an unknown vector θ R d from i.i.d.
Feb-21-2025