Not enough data to create a plot.
Try a different view from the menu above.
Sharma, Yash
GenAttack: Practical Black-box Attacks with Gradient-Free Optimization
Alzantot, Moustafa, Sharma, Yash, Chakraborty, Supriyo, Srivastava, Mani
Deep neural networks (DNNs) are vulnerable to adversarial examples, even in the black-box case, where the attacker is limited to solely query access. Existing blackbox approaches to generating adversarial examples typically require a significant amount of queries, either for training a substitute network or estimating gradients from the output scores. We introduce GenAttack, a gradient-free optimization technique which uses genetic algorithms for synthesizing adversarial examples in the black-box setting. Our experiments on the MNIST, CIFAR-10, and ImageNet datasets show that GenAttack can successfully generate visually imperceptible adversarial examples against state-of-the-art image recognition models with orders of magnitude fewer queries than existing approaches. For example, in our CIFAR-10 experiments, GenAttack required roughly 2,568 times less queries than the current state-of-the-art black-box attack. Furthermore, we show that GenAttack can successfully attack both the state-of-the-art ImageNet defense, ensemble adversarial training, and non-differentiable, randomized input transformation defenses. GenAttack's success against ensemble adversarial training demonstrates that its query efficiency enables it to exploit the defense's weakness to direct black-box attacks. GenAttack's success against non-differentiable input transformations indicates that its gradient-free nature enables it to be applicable against defenses which perform gradient masking/obfuscation to confuse the attacker. Our results suggest that population-based optimization opens up a promising area of research into effective gradient-free black-box attacks.
Attacking the Madry Defense Model with $L_1$-based Adversarial Examples
Sharma, Yash, Chen, Pin-Yu
The Madry Lab recently hosted a competition designed to test the robustness of their adversarially trained MNIST model. Attacks were constrained to perturb each pixel of the input image by a scaled maximal $L_\infty$ distortion $\epsilon$ = 0.3. This discourages the use of attacks which are not optimized on the $L_\infty$ distortion metric. Our experimental results demonstrate that by relaxing the $L_\infty$ constraint of the competition, the elastic-net attack to deep neural networks (EAD) can generate transferable adversarial examples which, despite their high average $L_\infty$ distortion, have minimal visual distortion. These results call into question the use of $L_\infty$ as a sole measure for visual distortion, and further demonstrate the power of EAD at generating robust adversarial examples.
Bypassing Feature Squeezing by Increasing Adversary Strength
Sharma, Yash, Chen, Pin-Yu
Feature Squeezing is a recently proposed defense method which reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. It has been shown that feature squeezing defenses can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks. However, we demonstrate on the MNIST and CIFAR-10 datasets that by increasing the adversary strength of said state-of-the-art attacks, one can bypass the detection framework with adversarial examples of minimal visual distortion. These results suggest for proposed defenses to validate against stronger attack configurations.
EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
Chen, Pin-Yu, Sharma, Yash, Zhang, Huan, Yi, Jinfeng, Hsieh, Cho-Jui
Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples - a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on $L_2$ and $L_\infty$ distortion metrics. However, despite the fact that $L_1$ distortion accounts for the total variation and encourages sparsity in the perturbation, little has been developed for crafting $L_1$-based adversarial examples. In this paper, we formulate the process of attacking DNNs via adversarial examples as an elastic-net regularized optimization problem. Our elastic-net attacks to DNNs (EAD) feature $L_1$-oriented adversarial examples and include the state-of-the-art $L_2$ attack as a special case. Experimental results on MNIST, CIFAR10 and ImageNet show that EAD can yield a distinct set of adversarial examples with small $L_1$ distortion and attains similar attack performance to the state-of-the-art methods in different attack scenarios. More importantly, EAD leads to improved attack transferability and complements adversarial training for DNNs, suggesting novel insights on leveraging $L_1$ distortion in adversarial machine learning and security implications of DNNs.
EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
Chen, Pin-Yu (IBM Research AI) | Sharma, Yash (The Cooper Union, New York) | Zhang, Huan (University of California, Davis) | Yi, Jinfeng (Tencent AI Lab) | Hsieh, Cho-Jui (University of California, Davis)
Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples — a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on L 2 and L ∞ distortion metrics. However, despite the fact that L 1 distortion accounts for the total variation and encourages sparsity in the perturbation, little has been developed for crafting L 1 -based adversarial examples. In this paper, we formulate the process of attacking DNNs via adversarial examples as an elastic-net regularized optimization problem. Our elastic-net attacks to DNNs (EAD) feature L 1 -oriented adversarial examples and include the state-of-the-art L 2 attack as a special case. Experimental results on MNIST, CIFAR10 and ImageNet show that EAD can yield a distinct set of adversarial examples with small L 1 distortion and attains similar attack performance to the state-of-the-art methods in different attack scenarios. More importantly, EAD leads to improved attack transferability and complements adversarial training for DNNs, suggesting novel insights on leveraging L 1 distortion in adversarial machine learning and security implications of DNNs.
ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models
Chen, Pin-Yu, Zhang, Huan, Sharma, Yash, Yi, Jinfeng, Hsieh, Cho-Jui
Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack and significantly outperforms existing black-box attacks via substitute models.