Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

Salman, Hadi, Li, Jerry, Razenshteyn, Ilya, Zhang, Pengchuan, Zhang, Huan, Bubeck, Sebastien, Yang, Greg

Neural Information Processing Systems 

In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Papers published at the Neural Information Processing Systems Conference.