Recent studies on the adversarial vulnerability of neural networks have shown that models trained with the objective of minimizing an upper bound on the worst-case loss over all possible adversarial perturbations improve robustness against adversarial attacks. Beside exploiting adversarial training framework, we show that by enforcing a Deep Neural Network (DNN) to be linear in transformed input and feature space improves robustness significantly. We also demonstrate that by augmenting the objective function with Local Lipschitz regularizer boost robustness of the model further. Our method outperforms most sophisticated adversarial training methods and achieves state of the art adversarial accuracy on MNIST, CIFAR10 and SVHN dataset. In this paper, we also propose a novel adversarial image generation method by leveraging Inverse Representation Learning and Linearity aspect of an adversarially trained deep neural network classifier.
Recent efforts show that neural networks are vulnerable to small but intentional perturbations on input features in visual classification tasks. Due to the additional consideration of connections between examples (e.g., articles with citation link tend to be in the same class), graph neural networks could be more sensitive to the perturbations, since the perturbations from connected examples exacerbate the impact on a target example. Adversarial Training (AT), a dynamic regularization technique, can resist the worst-case perturbations on input features and is a promising choice to improve model robustness and generalization. However, existing AT methods focus on standard classification, being less effective when training models on graph since it does not model the impact from connected examples. In this work, we explore adversarial training on graph, aiming to improve the robustness and generalization of models learned on graph. We propose Graph Adversarial Training (GAT), which takes the impact from connected examples into account when learning to construct and resist perturbations. We give a general formulation of GAT, which can be seen as a dynamic regularization scheme based on the graph structure. To demonstrate the utility of GAT, we employ it on a state-of-the-art graph neural network model --- Graph Convolutional Network (GCN). We conduct experiments on two citation graphs (Citeseer and Cora) and a knowledge graph (NELL), verifying the effectiveness of GAT which outperforms normal training on GCN by 4.51% in node classification accuracy. Codes will be released upon acceptance.
Neural networks are vulnerable to adversarial attacks -- small visually imperceptible crafted noise which when added to the input drastically changes the output. The most effective method of defending against these adversarial attacks is to use the methodology of adversarial training. We analyze the adversarially trained robust models to study their vulnerability against adversarial attacks at the level of the latent layers. Our analysis reveals that contrary to the input layer which is robust to adversarial attack, the latent layer of these robust models are highly susceptible to adversarial perturbations of small magnitude. Leveraging this information, we introduce a new technique Latent Adversarial Training (LAT) which comprises of fine-tuning the adversarially trained models to ensure the robustness at the feature layers. We also propose Latent Attack (LA), a novel algorithm for construction of adversarial examples. LAT results in minor improvement in test accuracy and leads to a state-of-the-art adversarial accuracy against the universal first-order adversarial PGD attack which is shown for the MNIST, CIFAR-10, CIFAR-100 datasets.
Adversarial training is the most successful empirical method, to increase the robustness of neural networks against adversarial attacks yet. Unfortunately, this higher robustness is accompanied by considerably higher computational complexity. To date, only adversarial training with expensive multi-step adversarial attacks like Projected Gradient Descent (PGD) proved effective against equally strong attacks. In this paper, we present two ideas that combined enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of stronger adversaries. Further, we add a learnable regularization step prior to the neural network called Stochastic Augmentation Layer (SAL). Inputs propagated trough the SAL are resampled from a Gaussian distribution. The randomness of the resampling at inference time makes it more complicated for the attacker to construct an adversarial example since the outcome of the model is not known in advance. We show that noise injection in conjunction with FGSM adversarial training achieves comparable results to adversarial training with PGD while being orders of magnitude faster. Moreover, we show superior results in comparison to PGD-based training when combining noise injection and SAL.
Deep neural networks have been shown to be susceptible to adversarial examples -- small, imperceptible changes constructed to cause mis-classification in otherwise highly accurate image classifiers. As a practical alternative, recent work proposed so-called adversarial patches: clearly visible, but adversarially crafted rectangular patches in images. These patches can easily be printed and applied in the physical world. While defenses against imperceptible adversarial examples have been studied extensively, robustness against adversarial patches is poorly understood. In this work, we first devise a practical approach to obtain adversarial patches while actively optimizing their location within the image. Then, we apply adversarial training on these location-optimized adversarial patches and demonstrate significantly improved robustness on CIFAR10 and GTSRB. Additionally, in contrast to adversarial training on imperceptible adversarial examples, our adversarial patch training does not reduce accuracy.