Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation
Hein, Matthias, Andriushchenko, Maksym
–Neural Information Processing Systems
Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific \emph{lower bounds} on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp.
Neural Information Processing Systems
Feb-14-2020, 09:56:10 GMT
- Technology: