Goto

Collaborating Authors

 gradalign



b8ce47761ed7b3b6f48b583350b7f9e4-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for the encouraging feedback and detailed comments which we'll integrate into the next version. Why a noise-sensitive filter is learned by FGSM training? After running AutoAttack, we observe that it proportionally reduces the adversarial accuracy for all methods. Their exact goal is to make the loss surface smoother . R2: Line 118: How is the alpha step-size tuned for this experiments?


Understanding and Improving Fast Adversarial Training

Neural Information Processing Systems

A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that $\ell_\infty$-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called catastrophic overfitting, when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se --- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger $\ell_\infty$-perturbations and reduce the gap to multi-step adversarial training.


Make Some Noise: Reliable and Efficient Single-Step Adversarial Training

Neural Information Processing Systems

Recently, Wong et al. (2020) showed that adversarial training with single-step FGSM leads to a characteristic failure mode named catastrophic overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. Experimentally they showed that simply adding a random perturbation prior to FGSM (RS-FGSM) could prevent CO. However, Andriushchenko & Flammarion (2020) observed that RS-FGSM still leads to CO for larger perturbations, and proposed a computationally expensive regularizer (GradAlign) to avoid it. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with \textit{not clipping} is highly effective in avoiding CO for large perturbation radii. We then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous state of-the-art GradAlign while achieving 3$\times$ speed-up.


Appendix A Deferred proofs

Neural Information Processing Systems

In this section, we show the proofs omitted from Sec. 3 and Sec. 4. A.1 Proof of Lemma 1 We state again Lemma 1 from Sec. 3 and present the proof. First, note that due to the Jensen's inequality, we can have a convenient upper bound which is For this purpose, in Figure 1 we plot: 15 Figure 9: Visualization of the key quantities involved in Lemma 2. We list detailed evaluation and training details below. The single-layer CNN that we study in Sec. 4 has 4 convolutional filters, each of them of size We describe here supporting experiments and visualizations related to Sec. 3 and Sec. 4. C.1 Quality of the linear approximation for ReLU networks The phenomenon is even more pronounced for FGSM perturbations as the linearization error is much higher there. C.2 Catastrophic overfitting in a single-layer CNN We describe here complementary figures to Sec. 4 which are related to the single-layer CNN. Laplace filter which is very sensitive to noise.





Appendix A Deferred proofs

Neural Information Processing Systems

In this section, we show the proofs omitted from Sec. 3 and Sec. 4. A.1 Proof of Lemma 1 We state again Lemma 1 from Sec. 3 and present the proof. First, note that due to the Jensen's inequality, we can have a convenient upper bound which is For this purpose, in Figure 1 we plot: 15 Figure 9: Visualization of the key quantities involved in Lemma 2. We list detailed evaluation and training details below. The single-layer CNN that we study in Sec. 4 has 4 convolutional filters, each of them of size We describe here supporting experiments and visualizations related to Sec. 3 and Sec. 4. C.1 Quality of the linear approximation for ReLU networks The phenomenon is even more pronounced for FGSM perturbations as the linearization error is much higher there. C.2 Catastrophic overfitting in a single-layer CNN We describe here complementary figures to Sec. 4 which are related to the single-layer CNN. Laplace filter which is very sensitive to noise.


b8ce47761ed7b3b6f48b583350b7f9e4-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for the encouraging feedback and detailed comments which we'll integrate into the next version. Why a noise-sensitive filter is learned by FGSM training? After running AutoAttack, we observe that it proportionally reduces the adversarial accuracy for all methods. Their exact goal is to make the loss surface smoother . R2: Line 118: How is the alpha step-size tuned for this experiments?