natural adversarial example
Generating Valid and Natural Adversarial Examples with Large Language Models
Wang, Zimu, Wang, Wei, Chen, Qi, Wang, Qiufeng, Nguyen, Anh
Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.
Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Physical Systems
Pugh, Sydney, Ruchkin, Ivan, Lee, Insup, Weimer, James
Deep learning models have shown promising predictive accuracy for time-series healthcare applications. However, ensuring the robustness of these models is vital for building trustworthy AI systems. Existing research predominantly focuses on robustness to synthetic adversarial examples, crafted by adding imperceptible perturbations to clean input data. However, these synthetic adversarial examples do not accurately reflect the most challenging real-world scenarios, especially in the context of healthcare data. Consequently, robustness to synthetic adversarial examples may not necessarily translate to robustness against naturally occurring adversarial examples, which is highly desirable for trustworthy AI. We propose a method to curate datasets comprised of natural adversarial examples to evaluate model robustness. The method relies on probabilistic labels obtained from automated weakly-supervised labeling that combines noisy and cheap-to-obtain labeling heuristics. Based on these labels, our method adversarially orders the input data and uses this ordering to construct a sequence of increasingly adversarial datasets. Our evaluation on six medical case studies and three non-medical case studies demonstrates the efficacy and statistical validity of our approach to generating naturally adversarial datasets
Adversarial Examples in Deep Learning โ A Primer - KDnuggets
We have seen the advent of state-of-the-art (SOTA) deep learning models for computer vision ever since we started getting bigger and better compute (GPUs and TPUs), more data (ImageNet etc.) and easy to use open-source software and tools (TensorFlow and PyTorch). Every year (and now every few months!) we see the next SOTA deep learning model dethrone the previous model in terms of Top-k accuracy for benchmark datasets. The following figure depicts some of the latest SOTA deep learning vision models (and doesn't depict some like Google's BigTransfer!). However most of these SOTA deep learning models are brought down to their knees when it tries to make predictions on a specific class of images, called as adversarial images. The whole idea of an adversarial example can be a natural example or a synthetic example.
Adversarial Examples in Deep Learning -- A Primer
We have seen the advent of state-of-the-art (SOTA) deep learning models for computer vision ever since we started getting bigger and better compute (GPUs and TPUs), more data (ImageNet etc.) and easy to use open-source software and tools (TensorFlow and PyTorch). Every year (and now every few months!) we see the next SOTA deep learning model dethrone the previous model in terms of Top-k accuracy for benchmark datasets. The following figure depicts some of the latest SOTA deep learning vision models (and doesn't depict some like Google's BigTransfer!). However most of these SOTA deep learning models are brought down to their knees when it tries to make predictions on a specific class of images, called as adversarial images. The whole idea of an adversarial example can be a natural example or a synthetic example.
Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN
Burnel, Jean-Christophe, Fatras, Kilian, Courty, Nicolas
Adversarial examples are a hot topic due to their abilities to fool a classifier's prediction. There are two strategies to create such examples, one uses the attacked classifier's gradients, while the other only requires access to the clas-sifier's prediction. This is particularly appealing when the classifier is not full known (black box model). In this paper, we present a new method which is able to generate natural adversarial examples from the true data following the second paradigm. Based on Generative Adversarial Networks (GANs) [5], it reweights the true data empirical distribution to encourage the classifier to generate ad-versarial examples. We provide a proof of concept of our method by generating adversarial hyperspectral signatures on a remote sensing dataset.
Generate (non-software) Bugs to Fool Classifiers
Yakura, Hiromu, Akimoto, Youhei, Sakuma, Jun
Let us consider a scenario in which an attacker wishes to modify input image x so that the target model f classifies it with the specific label t . The generation process can be represented as follows: ห v argmin v L f ( x v,t) null nullv null, (1) where L f denotes a loss function that represents how distant the input data are from the given label under f and v null null v null is a norm function to regularize the perturbation so that v becomes unnoticeable to humans. Then, x ห v is expected to form an adversarial example that is classified as t while it looks similar to x . Earlier approaches, such as Szegedy et al. (2014) and Moosavi-Dezfooli, Fawzi, and Frossard (2016), used L 2-norm to limit the magnitude of the perturbation. In contrast, Su, V argas, and Sakurai (2017) used L 0-norm to limit the number of modified pixels and showed that even modification of a one-pixel could generate adversarial examples. More recent studies introduced GAN instead of directly optimizing perturbations (Xiao et al. 2018; Zhao, Dua, and Singh 2018) for the purpose of ensuring the naturalness of adversarial examples. For example, Xiao et al. (2018) trained a discriminator network to distinguish adversarial examples from natural images so that the generator network produced adversarial examples that appeared as natural images. Given the distribution p x over the natural images and the tradeoff parameter ฮฑ, its training process can be represented similarly to that in Goodfellow et al. (2014) as follows: min G max D E x p x[log D (x)] E x p x[log (1 D ( x G ( x)))] ฮฑ E x p x[L f (x G (x),t)] .
If you can identify what's in these images, you're smarter than AI
Computer vision has improved massively in recent years, but it's still capable of making serious errors. So much so that there's a whole field of research dedicated to studying pictures that are routinely misidentified by AI, known as "adversarial images." Think of them as optical illusions for computers. While you see a cat up a tree, the AI sees a squirrel. There's a great need to study these images.
Natural Adversarial Examples
Hendrycks, Dan, Zhao, Kevin, Basart, Steven, Steinhardt, Jacob, Song, Dawn
We introduce natural adversarial examples -- real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. We curate 7,500 natural adversarial examples and release them in an ImageNet classifier test set that we call ImageNet-A. This dataset serves as a new way to measure classifier robustness. Like l_p adversarial examples, ImageNet-A examples successfully transfer to unseen or black-box classifiers. For example, on ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%. Recovering this accuracy is not simple because ImageNet-A examples exploit deep flaws in current classifiers including their over-reliance on color, texture, and background cues. We observe that popular training techniques for improving robustness have little effect, but we show that some architectural changes can enhance robustness to natural adversarial examples. Future research is required to enable robust generalization to this hard ImageNet test set.
Robustness properties of Facebook's ResNeXt WSL models
We investigate the robustness properties of ResNeXt image recognition models trained with billion scale weakly-supervised data (ResNeXt WSL models). These models, recently made public by Facebook AI, were trained on 1B images from Instagram and fine-tuned on ImageNet. We show that these models display an unprecedented degree of robustness against common image corruptions and perturbations, as measured by the ImageNet-C and ImageNet-P benchmarks. The largest of the released models, in particular, achieves state-of-the-art results on both ImageNet-C and ImageNet-P by a large margin. The gains on ImageNet-C and ImageNet-P far outpace the gains on ImageNet validation accuracy, suggesting the former as more useful benchmarks to measure further progress in image recognition. Remarkably, the ResNeXt WSL models even achieve a limited degree of adversarial robustness against state-of-the-art white-box attacks (10-step PGD attacks). However, in contrast to adversarially trained models, the robustness of the ResNeXt WSL models rapidly declines with the number of PGD steps, suggesting that these models do not achieve genuine adversarial robustness. Visualization of the learned features also confirms this conclusion. Finally, we show that although the ResNeXt WSL models are more shape-biased than comparable ImageNet-trained models in a shape-texture cue conflict experiment, they still remain much more texture-biased than humans and their accuracy on the recently introduced "natural adversarial examples" (ImageNet-A) also remains low, suggesting that they share many of the underlying characteristics of ImageNet-trained models that make these benchmarks challenging.