On the Lipschitz Constant of Deep Networks and Double Descent

Gamba, Matteo, Azizpour, Hossein, Björkman, Mårten

Nov-14-2023–arXiv.org Artificial Intelligence

A longstanding question towards understanding the remarkable generalization ability of deep networks is characterizing the hypothesis class of models trained in practice, thus isolating properties of the networks' model function that capture generalization (Hanin & Rolnick, 2019; Neyshabur et al., 2015). Chiefly, a central problem is understanding the role played by overparameterization (Arora et al., 2018; Neyshabur et al., 2018; Zhang et al., 2018) - a key design choice of state of the art models - in promoting regularization of the model function. Modern overparameterized networks can achieve good generalization while perfectly interpolating the training set (Nakkiran et al., 2019). This phenomenon is described by the double descent curve of the test error (Belkin et al., 2019; Geiger et al., 2019): as model size increases, the error follows the classical bias-variance trade-off curve (Geman et al., 1992), peaks when a model is large enough to interpolate the training data, and then decreases again as model size grows further.

lipschitz constant, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-14-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > New York (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found