It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models
Chen, Lin, Lukasik, Michal, Jitkrittum, Wittawat, You, Chong, Kumar, Sanjiv
The concepts of bias and variance, obtained from decomposing the generalization error, are of fundamental importance in machine learning. Classical wisdom suggests that there is a trade-off between bias and variance: models of low capacity have high bias and low variance, while models of high capacity have low bias and high variance. This understanding served as an important guiding principle for developing generalizable machine learning models, suggesting that they should be neither too large nor too small [Bishop, 2006]. Recently, a line of research found that deep models defy this classical wisdom [Belkin et al., 2019]: their variance curves exhibit a unimodal shape that first increases with model size, then decreases beyond the point that the models can perfectly fit the training data [Neal et al., 2018, Yang et al., 2020]. While the unimodal variance curve explains why over-parameterized deep models generalize well, there is still a lack of understanding on why it occurs. This paper revisits the study of bias and variance to understand their behavior in deep models. We perform a per-sample measurement of bias and variance in popular deep classification models. Our study reveals a curious phenomenon, which is radically different from the classical tradeoff perspective on bias-variance, while is concordant with more recent works [Belkin et al., 2019, Hastie et al., 2022, Mei and Montanari, 2022].
Oct-13-2023