anishathalye/obfuscated-gradients
Above is an adversarial example: the slightly perturbed image of the cat fools an InceptionV3 classifier into classifying it as "guacamole". Such "fooling images" are easy to synthesize using gradient descent (Szegedy et al. 2013). In our recent paper, we evaluate the robustness of eight papers accepted to ICLR 2018 as defenses to adversarial examples. We find that seven of the eight defenses provide a limited increase in robustness and can be broken by improved attack techniques we develop. The only defense we observe that significantly increases robustness to adversarial examples within the threat model proposed is "Towards Deep Learning Models Resistant to Adversarial Attacks" (Madry et al. 2018), and we were unable to defeat this defense without stepping outside the threat model.
Feb-1-2018, 23:20:50 GMT
- Technology: