On the Lipschitz Constant of Deep Networks and Double Descent

Gamba, Matteo, Azizpour, Hossein, Björkman, Mårten

arXiv.org Artificial Intelligence 

A longstanding question towards understanding the remarkable generalization ability of deep networks is characterizing the hypothesis class of models trained in practice, thus isolating properties of the networks' model function that capture generalization (Hanin & Rolnick, 2019; Neyshabur et al., 2015). Chiefly, a central problem is understanding the role played by overparameterization (Arora et al., 2018; Neyshabur et al., 2018; Zhang et al., 2018) - a key design choice of state of the art models - in promoting regularization of the model function. Modern overparameterized networks can achieve good generalization while perfectly interpolating the training set (Nakkiran et al., 2019). This phenomenon is described by the double descent curve of the test error (Belkin et al., 2019; Geiger et al., 2019): as model size increases, the error follows the classical bias-variance trade-off curve (Geman et al., 1992), peaks when a model is large enough to interpolate the training data, and then decreases again as model size grows further.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found