Interpolation can hurt robust generalization even when there is no noise

Donhauser, Konstantin, Ţifrea, Alexandru, Aerni, Michael, Heckel, Reinhard, Yang, Fanny

Aug-5-2021–arXiv.org Machine Learning

Conventional statistical wisdom cautions the user that trains a model by minimizing a loss L(θ): if a global minimizer achieves zero or near-zero training loss (i.e., it interpolates), we run the risk of overfitting (i.e., high variance) and thus sub-optimal prediction performance. Instead, regularization is commonly used to reduce the effect of noise and to obtain an estimator with better generalization. Specifically, regularization limits model complexity and induces worse data fit, for example via an explicit penalty term R(θ). The resulting penalized loss L(θ) λR(θ) explicitly imposes certain structural properties on the minimizer. This classical rationale, however, does seemingly not apply to overparameterized models: in practice, large neural networks, for example, exhibit good generalization performance on i.i.d.

artificial intelligence, evolutionary algorithm, robust risk, (19 more...)

arXiv.org Machine Learning

Aug-5-2021

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)
- North America > United States (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Regression (0.70)
  - Representation & Reasoning (1.00)