Failures of model-dependent generalization bounds for least-norm interpolation

Bartlett, Peter L., Long, Philip M.

arXiv.org Machine Learning 

Deep learning methodology has revealed some striking deficiencies of classical statistical learning theory: large neural networks, trained to zero empirical risk on noisy training data, have good predictive accuracy on independent test data. These methods are overfitting (that is, fitting to the training data better than the noise should allow), but the overfitting is benign (that is, prediction performance is good). It is an important open problem to understand why this is possible. The presence of noise is key to why the success of interpolating algorithms is mysterious. Generalization of algorithms that produce a perfect fit in the absence of noise has been studied for decades (see [Haussler, 1992] and its references). A number of recent papers have provided generalization bounds for interpolating algorithms in the absence of noise, either for deep networks or in abstract frameworks motivated by deep networks [Li and Liang, 2018, Arora et al., 2019, Cao and Gu, 2019, Feldman, 2020]. The generalization bounds in these papers either do not hold or become vacuous in the presence of noise: Assumption A1 in [Li and Liang, 2018] rules out noisy data; the data-dependent bound in Arora et al. [2019, Theorem 5.1] becomes vacuous when independent noise is added to the y

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found