Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e.g., large drop in test accuracy. This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.
Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques. In this paper, we propose a black-box adversarial attack algorithm that can defeat both vanilla DNNs and those generated by various defense techniques developed recently. Instead of searching for an "optimal" adversarial example for a benign input to a targeted DNN, our algorithm finds a probability density distribution over a small region centered around the input, such that a sample drawn from this distribution is likely an adversarial example, without the need of accessing the DNN's internal layers or weights. Our approach is universal as it can successfully attack different neural networks by a single algorithm. It is also strong; according to the testing against 2 vanilla DNNs and 13 defended ones, it outperforms state-of-the-art black-box or white-box attack methods for most test cases. Additionally, our results reveal that adversarial training remains one of the best defense techniques, and the adversarial examples are not as transferable across defended DNNs as them across vanilla DNNs.
DeepRobust is a PyTorch  adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. It currently contains more than 10 attack algorithms and 8 defense algorithms in image domain and 9 attack algorithms and 4 defense algorithms in graph domain, under a variety of deep learning architectures. In this manual, we introduce the main contents of DeepRobust with detailed instructions. The library is kept updated and can be found at https: // github.
Nowadays, Deep Learning as a service can be deployed in Internet of Things (IoT) to provide smart services and sensor data processing. However, recent research has revealed that some Deep Neural Networks (DNN) can be easily misled by adding relatively small but adversarial perturbations to the input (e.g., pixel mutation in input images). One challenge in defending DNN against these attacks is to efficiently identifying and filtering out the adversarial pixels. The state-of-the-art defense strategies with good robustness often require additional model training for specific attacks. To reduce the computational cost without loss of generality, we present a defense strategy called a progressive defense against adversarial attacks (PDAAA) for efficiently and effectively filtering out the adversarial pixel mutations, which could mislead the neural network towards erroneous outputs, without a-priori knowledge about the attack type. We evaluated our progressive defense strategy against various attack methods on two well-known datasets. The result shows it outperforms the state-of-the-art while reducing the cost of model training by 50% on average.
Algorithmic defenses against adversarial examples remain an extremely open and challenging problem, with recent state-of-the-art defenses on ImageNet still achieving only 27.9% and 46.7% top-1 accuracy for white- and black-box PGD attacks, respectively, as of March 2018 \citepkannan2018adversarial. Unfortunately, despite the explosive emergence of defense strategies, there does not appear to be an easy algorithmic fix for the adversarial problem available in the short term. For example, one recent analysis investigated a series of promising methods that relied on gradient obfuscation, and demonstrated that they could be quickly broken \citepathalye2018obfuscated. Despite this, we also note that principled approaches to adversarial robustness are beginning to show promise. For example, several papers have demonstrated what appears to be both high accuracy and strong adversarial robustness on smaller datasets such as MNIST, \citepmadry2017towards,kannan2018adversarial, and there have also been several results including theoretical guarantees of adversarial robustness, albeit on small datasets and/or with still-insufficient accuracy \citepkolter2017provable, raghunathan2018certified, dvijotham2018dual.