We propose denoising dictionary learning (DDL), a simple yet effective technique as a protection measure against adversarial perturbations. We examined denoising dictionary learning on MNIST and CIFAR10 perturbed under two different perturbation techniques, fast gradient sign (FGSM) and jacobian saliency maps (JSMA). We evaluated it against five different deep neural networks (DNN) representing the building blocks of most recent architectures indicating a successive progression of model complexity of each other. We show that each model tends to capture different representations based on their architecture. For each model we recorded its accuracy both on the perturbed test data previously misclassified with high confidence and on the denoised one after the reconstruction using dictionary learning. The reconstruction quality of each data point is assessed by means of PSNR (Peak Signal to Noise Ratio) and Structure Similarity Index (SSI). We show that after applying (DDL) the reconstruction of the original data point from a noisy
Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch independently, and reconstructing the image, without losing significant image content. We call our method D3. This proposed defense mechanism is non-differentiable which makes it non-trivial for an adversary to apply gradient-based attacks. Moreover, we do not fine-tune the network with adversarial examples, making it more robust against unknown attacks. We present an analysis of the tradeoff between accuracy and robustness against adversarial attacks. We evaluate our method under black-box, grey-box, and white-box settings. On the ImageNet dataset, our method outperforms the state-of-the-art by 19.7% under grey-box setting, and performs comparably under black-box setting. For the white-box setting, the proposed method achieves 34.4% accuracy compared to the 0% reported in the recent works.
This paper proposes CuRTAIL, an end-to-end computing framework for characterizing and thwarting adversarial space in the context of Deep Learning (DL). The framework protects deep neural networks against adversarial samples, which are perturbed inputs carefully crafted by malicious entities to mislead the underlying DL model. The precursor for the proposed methodology is a set of new quantitative metrics to assess the vulnerability of various deep learning architectures to adversarial samples. CuRTAIL formalizes the goal of preventing adversarial samples as a minimization of the space unexplored by the pertinent DL model that is characterized in CuRTAIL vulnerability analysis step. To thwart the adversarial machine learning attack, CuRTAIL introduces the concept of Modular Robust Redundancy (MRR) as a viable solution to achieve the formalized minimization objective. The MRR methodology explicitly characterizes the geometry of the input data and the DL model parameters. It then learns a set of complementary but disjoint models which maximally cover the unexplored subspaces of the target DL model, thus reducing the risk of integrity attacks. We extensively evaluate CuRTAIL performance against the state-of-the-art attack models including fast-sign-gradient, Jacobian Saliency Map Attack, Deepfool, and Carlini&WagnerL2. Proof-of-concept implementations for analyzing various data collections including MNIST, CIFAR10, and ImageNet corroborate CuRTAIL effectiveness to detect adversarial samples in different settings. The computations in each MRR module can be performed independently. As such, CuRTAIL detection algorithm can be completely parallelized among multiple hardware settings to achieve maximum throughput. We further provide an accompanying API to facilitate the adoption of the proposed framework for various applications.
Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission-critical and real-time systems, miscreants start to attack them by intentionally making deep neural networks to misclassify an object of one type to be seen as another type. This can be catastrophic in some scenarios where the classification of a deep neural network can lead to a fatal decision by a machine. In this work, we used GTSRB dataset to craft adversarial samples by Fast Gradient Sign Method and Jacobian Saliency Method, used those crafted adversarial samples to attack another Deep Convolutional Neural Network and built the attacked network to be more resilient against adversarial attacks by making it more robust by Defensive Distillation and Adversarial Training
Neural networks play an increasingly important role in the field of machine learning and are included in many applications in society. Unfortunately, neural networks suffer from adversarial samples generated to attack them. However, most of the generation approaches either assume that the attacker has full knowledge of the neural network model or are limited by the type of attacked model. In this paper, we propose a new approach that generates a black-box attack to neural networks based on the swarm evolutionary algorithm. Benefiting from the improvements in the technology and theoretical characteristics of evolutionary algorithms, our approach has the advantages of effectiveness, black-box attack, generality, and randomness. Our experimental results show that both the MNIST images and the CIFAR-10 images can be perturbed to successful generate a black-box attack with 100\% probability on average. In addition, the proposed attack, which is successful on distilled neural networks with almost 100\% probability, is resistant to defensive distillation. The experimental results also indicate that the robustness of the artificial intelligence algorithm is related to the complexity of the model and the data set. In addition, we find that the adversarial samples to some extent reproduce the characteristics of the sample data learned by the neural network model.