Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

Jacobsen, Jörn-Henrik, Behrmannn, Jens, Carlini, Nicholas, Tramèr, Florian, Papernot, Nicolas

Mar-25-2019–arXiv.org Machine Learning

Adversarial examples are malicious inputs crafted to cause a model to misclassify them. Their most common instantiation, "perturbation-based" adversarial examples introduce changes to the input that leave its true label unchanged, yet result in a different model prediction. Conversely, "invariance-based" adversarial examples insert changes to the input that leave the model's prediction unaffected despite the underlying input's label having changed. In this paper, we demonstrate that robustness to perturbation-based adversarial examples is not only insufficient for general robustness, but worse, it can also increase vulnerability of the model to invariance-based adversarial examples. We mount attacks that exploit excessive model invariance in directions relevant to the task, which are able to find adversarial examples within the l ball. Excessive invariance is not limited to models trained to be robust to perturbationbased l -norm adversaries. Accordingly, we call for a set of precise definitions that taxonomize and address each of these shortcomings in learning.

adversarial example, artificial intelligence, neural network, (16 more...)

arXiv.org Machine Learning

Mar-25-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found