On the Onset of Robust Overfitting in Adversarial Training
Yu, Chaojian, Shi, Xiaolong, Yu, Jun, Han, Bo, Liu, Tongliang
–arXiv.org Artificial Intelligence
Adversarial Training (AT) is a widely-used algorithm for building robust neural networks, but it suffers from the issue of robust overfitting, the fundamental mechanism of which remains unclear. In this work, we consider normal data and adversarial perturbation as separate factors, and identify that the underlying causes of robust overfitting stem from the normal data through factor ablation in AT. Furthermore, we explain the onset of robust overfitting as a result of the model learning features that lack robust generalization, which we refer to as noneffective features. Specifically, we provide a detailed analysis of the generation of non-effective features and how they lead to robust overfitting. Additionally, we explain various empirical behaviors observed in robust overfitting and revisit different techniques to mitigate robust overfitting from the perspective of noneffective features, providing a comprehensive understanding of the robust overfitting phenomenon. This understanding inspires us to propose two measures, attack strength and data augmentation, to hinder the learning of non-effective features by the neural network, thereby alleviating robust overfitting. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed methods in mitigating robust overfitting and enhancing adversarial robustness. Adversarial Training (AT) (Madry et al., 2018) has emerged as a reliable method for improving a model's robustness against adversarial attacks (Szegedy et al., 2014; Goodfellow et al., 2015). It involves training networks using adversarial data generated on-the-fly and has been proven to be one of the most effective empirical defenses (Athalye et al., 2018). AT has shown success in building robust neural networks when applied to the MNIST dataset. However, achieving the same goal on more complex datasets like CIFAR10 has proven to be challenging (Madry et al., 2018).
arXiv.org Artificial Intelligence
Oct-1-2023