Harmless interpolation of noisy data in regression
Muthukumar, Vidya, Vodrahalli, Kailas, Sahai, Anant
In classification problems (i.e. when the labels Y are discrete), the scaling of the test error with respect to n is determined by characterizations of the VC-dimension [2]/Rademacher complexity [3] of the function class, which in the worst case increases with its number of parameters. In regression (i.e. when the labels Y are continuous), the mean-squared error of the ordinary least-squares estimate is characterized by the condition number of the regression matrix, which is reasonable for appropriate ratios of d/n but tends to increase astronomically as d approaches n. The qualitative fear is the same: if the function class is too complex, it starts to overfit noise and can generalize poorly to unseen test data. But there is a gap between "can" and "will" -- and indeed this conventional wisdom has been challenged by the recent advent of deeper and deeper neural networks. In particular, a thought-provoking paper [4] noted that several deep neural networks generalize well despite achieving zero or close to zero training error, and being so expressive that they even have the ability to fit pure noise. As they put it, "understanding deep learning requires rethinking generalization". How can we reconcile the fact that good interpolative solutions exist with the classical bias-variance tradeoff? These phenomena are being actively investigated in a statistical sense [5,6] and a computational sense [7-9] in classification problems and/or noiseless models.
Mar-21-2019
- Genre:
- Research Report (0.82)
- Industry:
- Energy > Oil & Gas
- Midstream (0.69)
- Materials > Chemicals
- Commodity Chemicals > Petrochemicals
- LNG (0.69)
- Industrial Gases > Liquified Gas (0.69)
- Commodity Chemicals > Petrochemicals
- Energy > Oil & Gas
- Technology: