Goto

Collaborating Authors

 adversarially robust model


Are Labels Required for Improving Adversarial Robustness?

Neural Information Processing Systems

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR-10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-the-art on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training.


Adversarial Robustness without Adversarial Training: A Teacher-Guided Curriculum Learning Approach

Neural Information Processing Systems

Current SOTA adversarially robust models are mostly based on adversarial training (AT) and differ only by some regularizers either at inner maximization or outer minimization steps. Being repetitive in nature during the inner maximization step, they take a huge time to train. We propose a non-iterative method that enforces the following ideas during training. Attribution maps are more aligned to the actual object in the image for adversarially robust models compared to naturally trained models. Also, the allowed set of pixels to perturb an image (that changes model decision) should be restricted to the object pixels only, which reduces the attack strength by limiting the attack space.


Are Labels Required for Improving Adversarial Robustness?

Neural Information Processing Systems

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR- 10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples.


On the Trade-offs between Adversarial Robustness and Actionable Explanations

Krishna, Satyapriya, Agarwal, Chirag, Lakkaraju, Himabindu

arXiv.org Artificial Intelligence

As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations.


Addressing Mistake Severity in Neural Networks with Semantic Knowledge

Abreu, Natalie, Vaska, Nathan, Helus, Victoria

arXiv.org Artificial Intelligence

Robustness in deep neural networks and machine learning algorithms in general is an open research challenge. In particular, it is difficult to ensure algorithmic performance is maintained on out-of-distribution inputs or anomalous instances that cannot be anticipated at training time. Embodied agents will be deployed in these conditions, and are likely to make incorrect predictions. An agent will be viewed as untrustworthy unless it can maintain its performance in dynamic environments. Most robust training techniques aim to improve model accuracy on perturbed inputs; as an alternate form of robustness, we aim to reduce the severity of mistakes made by neural networks in challenging conditions. We leverage current adversarial training methods to generate targeted adversarial attacks during the training process in order to increase the semantic similarity between a model's predictions and true labels of misclassified instances. Results demonstrate that our approach performs better with respect to mistake severity compared to standard and adversarially trained models. We also find an intriguing role that non-robust features play with regards to semantic similarity.


Adversarially Robust: The Benefits of Peripheral Vision for Machines

#artificialintelligence

New research from MIT suggests that a certain type of computer vision model that is trained to be robust to imperceptible noise added to image data encodes visual representations similarly to the way humans do using peripheral vision. Researchers find similarities between how some computer-vision systems process images and how humans see out of the corners of our eyes. Perhaps computer vision and human vision have more in common than meets the eye? Research from MIT suggests that a certain type of robust computer-vision model perceives visual representations similarly to the way humans do using peripheral vision. These models, known as adversarially robust models, are designed to overcome subtle bits of noise that have been added to image data.


The benefits of peripheral vision for machines

#artificialintelligence

Perhaps computer vision and human vision have more in common than meets the eye? Research from MIT suggests that a certain type of robust computer-vision model perceives visual representations similarly to the way humans do using peripheral vision. These models, known as adversarially robust models, are designed to overcome subtle bits of noise that have been added to image data. The way these models learn to transform images is similar to some elements involved in human peripheral processing, the researchers found. But because machines do not have a visual periphery, little work on computer vision models has focused on peripheral processing, says senior author Arturo Deza, a postdoc in the Center for Brains, Minds, and Machines.


Adversarial robustness as a prior for better transfer learning - Microsoft Research

#artificialintelligence

Editor's note: This post and its research are the collaborative efforts of our team, which includes Andrew Ilyas (PhD Student, MIT), Logan Engstrom (PhD Student, MIT), Aleksander Mądry (Professor at MIT), Ashish Kapoor (Partner Research Manager). In practical machine learning, it is desirable to be able to transfer learned knowledge from some "source" task to downstream "target" tasks. This is known as transfer learning--a simple and efficient way to obtain performant machine learning models, especially when there is little training data or compute available for solving the target task. Transfer learning is very useful in practice. For example, transfer learning allows perception models on a robot or other autonomous system to be trained on a synthetic dataset generated via a high-fidelity simulator, such as AirSim, and then refined on a small dataset collected in the real world.


High-frequency component helps explain the generalization of convolutional neural networks

AIHub

There are many works aiming to explain the generalization behavior of neural networks using heavy mathematical machinery, but we will do something different here: with a simple and intuitive twist of data, we will show that many generalization mysteries (like adversarial vulnerability, BatchNorm's efficacy, and the "generalization paradox") might be results of our overconfidence in processing data through naked eyes. The models may have not outsmarted us, but the data has. Let's start with an interesting observation (Figure 1): we trained a ResNet-18 with the Cifar10 dataset, picked a test sample, and plotted the model's prediction confidence for this sample. Then we mapped the sample into the frequency domain through Fourier transform, and cut the frequency representation into its high-frequency component (HFC) and low-frequency component (LFC). Although this phenomenon can only be observed with a subset of samples ( 600 images), it's striking enough to raise an alarm.


Are Labels Required for Improving Adversarial Robustness?

Alayrac, Jean-Baptiste, Uesato, Jonathan, Huang, Po-Sen, Fawzi, Alhussein, Stanforth, Robert, Kohli, Pushmeet

Neural Information Processing Systems

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR- 10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples.