adversarial example


Defending Against Adversarial Examples with K-Nearest Neighbor

#artificialintelligence

Robustness is an increasingly important property of machine learning models as they become more and more prevalent. We propose a defense against adversarial examples based on a k-nearest neighbor (kNN) on the intermediate activation of neural networks. With our models, the mean perturbation norm required to fool our MNIST model is 3.07 and 2.30 on CIFAR-10. Additionally, we propose a simple certifiable lower bound on the l2-norm of the adversarial perturbation using a more specific version of our scheme, a 1-NN on representations learned by a Lipschitz network. Our model provides a nontrivial average lower bound of the perturbation norm, comparable to other schemes on MNIST with similar clean accuracy.


Defending Against Adversarial Examples with K-Nearest Neighbor

#artificialintelligence

Robustness is an increasingly important property of machine learning models as they become more and more prevalent. We propose a defense against adversarial examples based on a k-nearest neighbor (kNN) on the intermediate activation of neural networks. With our models, the mean perturbation norm required to fool our MNIST model is 3.07 and 2.30 on CIFAR-10. Additionally, we propose a simple certifiable lower bound on the l2-norm of the adversarial perturbation using a more specific version of our scheme, a 1-NN on representations learned by a Lipschitz network. Our model provides a nontrivial average lower bound of the perturbation norm, comparable to other schemes on MNIST with similar clean accuracy.


The US is making digital camouflage so that AIs can't spot spy planes

New Scientist

Digital camouflage could make tanks look like cows or turn aircraft into clouds – but only if the observer is a computer. Artificially intelligent systems that recognise objects automatically have an Achilles heel. Slightly tweaked images, called adversarial examples, can fool them into misidentifying an object in ways absurd to humans. These adversarial examples include a plastic turtle which AIs mistake for a rifle and spectacle frames which baffle facial recognition software into misidentifying the wearer as someone else.


Convergence of Adversarial Training in Overparametrized Networks

arXiv.org Machine Learning

Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training, a heuristic form of robust optimization that alternates between minimization and maximization steps, has proven to be among the most successful methods to train networks that are robust against a pre-defined family of perturbations. This paper provides a partial answer to the success of adversarial training. When the inner maximization problem can be solved to optimality, we prove that adversarial training finds a network of small robust train loss. When the maximization problem is solved by a heuristic algorithm, we prove that adversarial training finds a network of small robust surrogate train loss. The analysis technique leverages recent work on the analysis of neural networks via Neural Tangent Kernel (NTK), combined with online-learning when the maximization is solved by a heuristic, and the expressiveness of the NTK kernel in the $\ell_\infty$-norm.


A unified view on differential privacy and robustness to adversarial examples

arXiv.org Machine Learning

This short note highlights some links between two lines of research within the emerging topic of trustworthy machine learning: differential privacy and robustness to adversarial examples. By abstracting the definitions of both notions, we show that they build upon the same theoretical ground and hence results obtained so far in one domain can be transferred to the other. More precisely, our analysis is based on two key elements: probabilistic mappings (also called randomized algorithms in the differential privacy community), and the Renyi divergence which subsumes a large family of divergences. We first generalize the definition of robustness against adversarial examples to encompass probabilistic mappings. Then we observe that Renyi-differential privacy (a generalization of differential privacy recently proposed in~\cite{Mironov2017RenyiDP}) and our definition of robustness share several similarities. We finally discuss how can both communities benefit from this connection to transfer technical tools from one research field to the other.


Misleading Failures of Partial-input Baselines

arXiv.org Artificial Intelligence

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.


Improving Black-box Adversarial Attacks with a Transfer-based Prior

arXiv.org Machine Learning

We consider the black-box adversarial setting, where the adversary has to generate adversarial perturbations without access to the target models to compute gradients. Previous methods tried to approximate the gradient either by using a transfer gradient of a surrogate white-box model, or based on the query feedback. However, these methods often suffer from low attack success rates or poor query efficiency since it is non-trivial to estimate the gradient in a high-dimensional space with limited information. To address these problems, we propose a prior-guided random gradient-free (P-RGF) method to improve black-box adversarial attacks, which takes the advantage of a transfer-based prior and the query information simultaneously. The transfer-based prior given by the gradient of a surrogate model is appropriately integrated into our algorithm by an optimal coefficient derived by a theoretical analysis. Extensive experiments demonstrate that our method requires much fewer queries to attack black-box models with higher success rates compared with the alternative state-of-the-art methods.


The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks

arXiv.org Machine Learning

Most state-of-the-art machine learning (ML) classification systems are vulnerable to adversarial perturbations. As a consequence, adversarial robustness poses a significant challenge for the deployment of ML-based systems in safety- and security-critical environments like autonomous driving, disease detection or unmanned aerial vehicles. In the past years we have seen an impressive amount of publications presenting more and more new adversarial attacks. However, the attack research seems to be rather unstructured and new attacks often appear to be random selections from the unlimited set of possible adversarial attacks. With this publication, we present a structured analysis of the adversarial attack creation process. By detecting different building blocks of adversarial attacks, we outline the road to new sets of adversarial attacks. We call this the "attack generator". In the pursuit of this objective, we summarize and extend existing adversarial perturbation taxonomies. The resulting taxonomy is then linked to the application context of computer vision systems for autonomous vehicles, i.e. semantic segmentation and object detection. Finally, in order to prove the usefulness of the attack generator, we investigate existing semantic segmentation attacks with respect to the detected defining components of adversarial attacks.


Adversarial attacks on Copyright Detection Systems

arXiv.org Machine Learning

It is well-known that many machine learning models are susceptible to so-called "adversarial attacks," in which an attacker evades a classifier by making small perturbations to inputs. This paper discusses how industrial copyright detection tools, which serve a central role on the web, are susceptible to adversarial attacks. We discuss a range of copyright detection systems, and why they are particularly vulnerable to attacks. These vulnerabilities are especially apparent for neural network based systems. As a proof of concept, we describe a well-known music identification method, and implement this system in the form of a neural net. We then attack this system using simple gradient methods. Adversarial music created this way successfully fools industrial systems, including the AudioTag copyright detector and YouTube's Content ID system. Our goal is to raise awareness of the threats posed by adversarial examples in this space, and to highlight the importance of hardening copyright detection systems to attacks.


Adversarial Training Generalizes Data-dependent Spectral Norm Regularization

arXiv.org Machine Learning

We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks. Specifically, we show that adversarial training is a data-dependent generalization of spectral norm regularization. This intriguing connection provides fundamental insights into the origin of adversarial vulnerability and hints at novel ways to robustify and defend against adversarial attacks. We provide extensive empirical evidence to support our theoretical results.