certified adversarial robustness
Certified Adversarial Robustness via Randomized \alpha -Smoothing for Regression Models
Certified adversarial robustness of large-scale deep networks has progressed substantially after the introduction of randomized smoothing. Deep net classifiers are now provably robust in their predictions against a large class of threat models, including \ell_1, \ell_2, and \ell_\infty norm-bounded attacks. Certified robustness analysis by randomized smoothing has not been performed for deep regression networks where the output variable is continuous and unbounded. In this paper, we extend the existing results for randomized smoothing into regression models using powerful tools from robust statistics, in particular, \alpha -trimming filter as the smoothing function. Adjusting the hyperparameter \alpha achieves a smooth trade-off between desired certified robustness and utility.
Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences
We propose Adaptive Randomized Smoothing (ARS) to certify the predictions of our test-time adaptive models against adversarial examples.ARS extends the analysis of randomized smoothing using f -Differential Privacy to certify the adaptive composition of multiple steps.For the first time, our theory covers the sound adaptive composition of general and high-dimensional functions of noisy inputs.We instantiate ARS on deep image classification to certify predictions against adversarial examples of bounded L_{\infty} norm.In the L_{\infty} threat model, ARS enables flexible adaptation through high-dimensional input-dependent masking.We design adaptivity benchmarks, based on CIFAR-10 and CelebA, and show that ARS improves standard test accuracy by 1 to 15\% points.On ImageNet, ARS improves certified test accuracy by up to 1.6% points over standard RS without adaptivity.
On the Scalability of Certified Adversarial Robustness with Generated Data
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirical methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses but also reveal notable differences in the scaling behavior between certified and empirical methods. In addition, we provide a list of recommendations to scale the robustness of certified training approaches.
Reviews: Certified Adversarial Robustness with Additive Noise
I have just some further comments. The certified bound for L_infty 0.3 for MNIST shown in Figure 2 shows that it is approximately 70% accuracy? Whereas TRADES seems to be closer to 100% and Gowal et al is above 90% - it seems low compared to the numbers I am used to. This might be due to the bound being too loose. I definitely agree that the goal of the adversary is to find an image where the difference is imperceptible to the human eye, however, when the perturbation radius is larger we should be less sure that **all** images within this space are imperceptible to the original.
Certified Adversarial Robustness with Additive Noise
The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defense models has been developed, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds.
Certified Adversarial Robustness of Machine Learning-based Malware Detectors via (De)Randomized Smoothing
Gibert, Daniel, Demetrio, Luca, Zizzo, Giulio, Le, Quan, Planes, Jordi, Biggio, Battista
Deep learning-based malware detection systems are vulnerable to adversarial EXEmples - carefully-crafted malicious programs that evade detection with minimal perturbation. As such, the community is dedicating effort to develop mechanisms to defend against adversarial EXEmples. However, current randomized smoothing-based defenses are still vulnerable to attacks that inject blocks of adversarial content. In this paper, we introduce a certifiable defense against patch attacks that guarantees, for a given executable and an adversarial patch size, no adversarial EXEmple exist. Our method is inspired by (de)randomized smoothing which provides deterministic robustness certificates. During training, a base classifier is trained using subsets of continguous bytes. At inference time, our defense splits the executable into non-overlapping chunks, classifies each chunk independently, and computes the final prediction through majority voting to minimize the influence of injected content. Furthermore, we introduce a preprocessing step that fixes the size of the sections and headers to a multiple of the chunk size. As a consequence, the injected content is confined to an integer number of chunks without tampering the other chunks containing the real bytes of the input examples, allowing us to extend our certified robustness guarantees to content insertion attacks. We perform an extensive ablation study, by comparing our defense with randomized smoothing-based defenses against a plethora of content manipulation attacks and neural network architectures. Results show that our method exhibits unmatched robustness against strong content-insertion attacks, outperforming randomized smoothing-based defenses in the literature.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Italy > Sardinia > Cagliari (0.04)
- Europe > Spain > Catalonia > Lleida Province > Lleida (0.04)
- (2 more...)
Certified Adversarial Robustness with Additive Noise
Li, Bai, Chen, Changyou, Wang, Wenlin, Carin, Lawrence
The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defense models has been developed, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds.
Certified Adversarial Robustness via Randomized Smoothing
Cohen, Jeremy M, Rosenfeld, Elan, Kolter, J. Zico
Recent work has shown that any classifier which classifies well under Gaussian noise can be leveraged to create a new classifier that is provably robust to adversarial perturbations in L2 norm. However, existing guarantees for such classifiers are suboptimal. In this work we provide the first tight analysis of this "randomized smoothing" technique. We then demonstrate that this extremely simple method outperforms by a wide margin all other provably L2-robust classifiers proposed in the literature. Furthermore, we train an ImageNet classifier with e.g. a provable top-1 accuracy of 49% under adversarial perturbations with L2 norm less than 0.5 (=127/255). No other provable adversarial defense has been shown to be feasible on ImageNet. While randomized smoothing with Gaussian noise only confers robustness in L2 norm, the empirical success of the approach suggests that provable methods based on randomization at test time are a promising direction for future research into adversarially robust classification. Code and trained models are available at https://github.com/locuslab/smoothing .
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (2 more...)