Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

Chatterji, Niladri S., Long, Philip M.

arXiv.org Machine Learning 

A surprising statistical phenomenon has emerged in modern machine learning: highly complex models can interpolate training data while still generalizing well to test data, even in the presence of label noise. This is rather striking as it the goes against the grain of the classical statistical wisdom which dictates that predictors that generalize well should trade off between the fit to the training data and the some measure of the complexity or smoothness of the predictor. Many estimators like neural networks, kernel estimators, nearest neighbour estimators, and even linear models have been shown to demonstrate this phenomenon (see, Zhang et al. 2017; Belkin et al. 2019, among others). This phenomenon has recently inspired intense theoretical research. One line of work (Soudry et al. 2018; Ji and Telgarsky 2019; Gunasekar et al. 2017; Nacson, Srebro, and Soudry 2019; Gunasekar et al. 2018a; Gunasekar et al. 2018b) formalized the argument (Neyshabur, Tomioka, and Srebro 2014; Neyshabur 2017) that, even when there is no explicit regularization that is used in training these rich models, there is nevertheless implicit regularization encoded in the choice of the optimization method used. For example, in the setting of linear classification, (Soudry et al. 2018; Ji and Telgarsky 2019; Nacson, Srebro, and Soudry 2019) show that learning a linear classifier using gradient descent on the unregularized logistic or exponential loss asymptotically leads the solution to converge to the maximum l

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found