Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition
Yu, Yaodong, Yang, Zitong, Dobriban, Edgar, Steinhardt, Jacob, Ma, Yi
Adversarial training enhances the robustness of deep neural networks at the cost of decreased accuracy on the clean test samples [Goodfellow et al., 2014, Madry et al., 2017, Sinha et al., 2017]. Though the model can fit the training data perfectly in adversarial training, the generalization error on clean test dataset increases compared with non-adversarially trained models. For example, in the rightmost panel of Figure 1(b), we can see that, even if an adversarially trained model achieves almost zero error on the clean training data (up to a certain level of perturbation ε), the error on the clean test data (the blue curve) keeps increasing with ε. Hence to improve both robustness and accuracy of (adversarially trained) deep networks, it is crucial to understand the cause for this increased "generalization gap" between errors on the (clean) training dataset and (clean) test dataset. In this work, to better understand the generalization gap, we turn to a standard tool of statistical learning theory, the bias-variance decomposition [Markov, 1900, Lehmann, 1983, Casella and Berger, 1990, Hastie et al., 2009, Geman et al., 1992]. A large variance corresponds to the instability of the model, whereas a large bias suggests that the model predicts poorly on average. Bias and variance provide more information about the generalization gap than just test error alone: We can better understand whether an explanation works by checking whether it predicts both the bias and variance. How does adversarial training affect the bias and the variance?
Mar-17-2021
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology (0.46)
- Technology: