Efficient local linearity regularization to overcome catastrophic overfitting

Rocamora, Elias Abad, Liu, Fanghui, Chrysos, Grigorios G., Olmos, Pablo M., Cevher, Volkan

arXiv.org Artificial Intelligence 

For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce local linearity of the loss via regularization. Instead, in this work, we introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations, as well as some more difficult regimes, e.g., large adversarial perturbations and long training schedules. Our regularization term can be theoretically linked to curvature of the loss function and is computationally cheaper than previous methods by avoiding Double Backpropagation. Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large ϵ setups. Adversarial Training (AT) (Madry et al., 2018) and TRADES (Zhang et al., 2019) have emerged as prominent training methods for training robust architectures. However, these training mechanisms involve solving an inner optimization problem per training step, often requiring an order of magnitude more time per iteration in comparison to standard training (Xu et al., 2023). To address the computational overhead per iteration, the solution of the inner maximization problem in a single step is commonly utilized. While this approach offers efficiency gains, it is also known to be unstable (Tramèr et al., 2018; Shafahi et al., 2019; Wong et al., 2020; de Jorge et al., 2022). CO is characterized by a sharp decline (even down to 0%) in multi-step test adversarial accuracy and a corresponding spike (up to 100%) in single-step train adversarial accuracy. Explicitly enforcing local linearity has been shown to allow reducing the number of steps needed to solve the inner maximization problem, while avoiding CO and gradient obfuscation (Qin et al., 2019; Andriushchenko and Flammarion, 2020). Nevertheless, all existing methods incur a 3 runtime due to Double Backpropagation (Etmann, 2019) Given this time-consuming operation to avoid CO, a natural question arises: Can we efficiently overcome catastrophic overfitting when enforcing local linearity of the loss? Partially done at Universidad Carlos III de Madrid, correspondance: elias.abadrocamora@epfl.ch We train with our method ELLE and its adaptive regularization variant ELLE-A.