Provable defenses against adversarial examples via the convex outer adversarial polytope

Jun-8-2018–arXiv.org Artificial Intelligence

We propose a method to learn deep ReLUbased classifiers that are provably robust against normbounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded l

artificial intelligence, classifier, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-8-2018

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (0.87)
- Government > Military (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found