Goto

Collaborating Authors

 autoattack


MALT Powers Up Adversarial Attacks

Neural Information Processing Systems

Current adversarial attacks for multi-class classifiers choose potential adversarial target classes naively based on the classifier's confidence levels. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on local almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and Imagenet and for different robust models. In particular, our attack uses a \emph{five times faster} attack strategy than AutoAttack's while successfully matching AutoAttack's successes and attacking additional samples that were previously out of reach. We additionally prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to non-linear models.




Supplement

Neural Information Processing Systems

In this section, we give an overview of related work in stable neural ODE networks. We also give an overview of common adversarial attacks and recent works that defend against adversarial examples. Stable Neural Network Gradient vanishing and gradient exploding are two well-known phenomena in deep learning [1]. The gradient of the objective function, which strongly relies on the training method as well as the neural network architecture, indicates how sensitive the output is with respect to (w.r.t.) input perturbation. Exploding gradient implies instability of the output w.r.t. the input and thus resulting in a non-robust learning architecture.




contributions and relation to prior work

Neural Information Processing Systems

We thank the reviewers for their helpful comments. Below, we address some of the points made regarding our work's On automated attacks (Reviewer 1 and 3). Reviewer 1 and Reviewer 3 argue that "AutoAttack" (Croce & Hein, "k-winners take all" defense (19% accuracy), whereas we reduce it to 0% accuracy. Adversarial Training" and "Are Generative Classifiers More Robust"). Of the 13 defenses we study, 5 aim at detecting adversarial examples. AutoAttack also cannot be directly applied to "Temporal Dependency" (a speech-to-text model) and "Robust Sparse Fourier Transform" (which is aimed at perturbations of small null We believe AutoAttack is a strong, non-adaptive baseline. The above points illustrate why. We apologize for not clarifying this in the paper. We still view these as white-box attacks. On related work & technical novelty (Reviewer 3). We view the fact that "defenses are broken by existing tech-32 This is what differentiates our work from prior work that proposed and argued for adaptive attacks (e.g., Carlini &



Supplement

Neural Information Processing Systems

In this section, we give an overview of related work in stable neural ODE networks. We also give an overview of common adversarial attacks and recent works that defend against adversarial examples. Stable Neural Network Gradient vanishing and gradient exploding are two well-known phenomena in deep learning [1]. The gradient of the objective function, which strongly relies on the training method as well as the neural network architecture, indicates how sensitive the output is with respect to (w.r.t.) input perturbation. Exploding gradient implies instability of the output w.r.t. the input and thus resulting in a non-robust learning architecture.