Interpolation can hurt robust generalization even when there is no noise
Donhauser, Konstantin, Ţifrea, Alexandru, Aerni, Michael, Heckel, Reinhard, Yang, Fanny
Conventional statistical wisdom cautions the user that trains a model by minimizing a loss L(θ): if a global minimizer achieves zero or near-zero training loss (i.e., it interpolates), we run the risk of overfitting (i.e., high variance) and thus sub-optimal prediction performance. Instead, regularization is commonly used to reduce the effect of noise and to obtain an estimator with better generalization. Specifically, regularization limits model complexity and induces worse data fit, for example via an explicit penalty term R(θ). The resulting penalized loss L(θ) λR(θ) explicitly imposes certain structural properties on the minimizer. This classical rationale, however, does seemingly not apply to overparameterized models: in practice, large neural networks, for example, exhibit good generalization performance on i.i.d.
Aug-5-2021
- Country:
- Europe (0.14)
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.68)
- Technology: