Fast Geometrically-Perturbed Adversarial Faces

arXiv.org Machine Learning

The state-of-the-art performance of deep learning algorithms has led to a considerable increase in the utilization of machine learning in security-sensitive and critical applications. However, it has recently been shown that a small and carefully crafted perturbation in the input space can completely fool a deep model. In this study, we explore the extent to which face recognition systems are vulnerable to geometrically-perturbed adversarial faces. We propose a fast landmark manipulation method for generating adversarial faces, which is approximately 200 times faster than the previous geometric attacks and obtains 99.86% success rate on the state-of-the-art face recognition models. To further force the generated samples to be natural, we introduce a second attack constrained on the semantic structure of the face which has the half speed of the first attack with the success rate of 99.96%. Both attacks are extremely robust against the state-of-the-art defense methods with the success rate of equal or greater than 53.59%. Code is available at https://github.com/alldbi/FLM.


facebookresearch/adversarial_image_defenses

#artificialintelligence

This package implements the experiments described in the paper Countering Adversarial Images Using Input Transformations. It contains implementations for adversarial attacks, defenses based image transformations, training, and testing convolutional networks under adversarial attacks using our defenses. We also provide pre-trained models.


AI Is Easy to Fool--Why That Needs to Change

#artificialintelligence

Con artistry is one of the world's oldest and most innovative professions, and it may soon have a new target. Research suggests artificial intelligence may be uniquely susceptible to tricksters, and as its influence in the modern world grows, attacks against it are likely to become more common. The root of the problem lies in the fact that artificial intelligence algorithms learn about the world in very different ways than people do, and so slight tweaks to the data fed into these algorithms can throw them off completely while remaining imperceptible to humans. Much of the research into this area has been conducted on image recognition systems, in particular those relying on deep learning neural networks. These systems are trained by showing them thousands of examples of images of a particular object until they can extract common features that allow them to accurately spot the object in new images.


Strong Black-box Adversarial Attacks on Unsupervised Machine Learning Models

arXiv.org Machine Learning

Machine Learning (ML) and Deep Learning (DL) models have achieved state-of-the-art performance on multiple learning tasks, from vision to natural language modelling. With the growing adoption of ML and DL to many areas of computer science, recent research has also started focusing on the security properties of these models. There has been a lot of work undertaken to understand if (deep) neural network architectures are resilient to black-box adversarial attacks which craft perturbed input samples that fool the classifier without knowing the architecture used. Recent work has also focused on the transferability of adversarial attacks and found that adversarial attacks are generally easily transferable between models, datasets, and techniques. However, such attacks and their analysis have not been covered from the perspective of unsupervised machine learning algorithms. In this paper, we seek to bridge this gap through multiple contributions. We first provide a strong (iterative) black-box adversarial attack that can craft adversarial samples which will be incorrectly clustered irrespective of the choice of clustering algorithm. We choose 4 prominent clustering algorithms, and a real-world dataset to show the working of the proposed adversarial algorithm. Using these clustering algorithms we also carry out a simple study of cross-technique adversarial attack transferability.


Black-box Adversarial Attacks on Video Recognition Models

arXiv.org Machine Learning

Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone a small, carefully crafted perturbation, and which can easily fool a DNN into making misclassifications at test time. Thus far, the field of adversarial research has mainly focused on image models, under either a white-box setting, where an adversary has full access to model parameters, or a black-box setting where an adversary can only query the target model for probabilities or labels. Whilst several white-box attacks have been proposed for video models, black-box video attacks are still unexplored. To close this gap, we propose the first black-box video attack framework, called V-BAD. V-BAD is a general framework for adversarial gradient estimation and rectification, based on Natural Evolution Strategies (NES). In particular, V-BAD utilizes \textit{tentative perturbations} transferred from image models, and \textit{partition-based rectifications} found by the NES on partitions (patches) of tentative perturbations, to obtain good adversarial gradient estimates with fewer queries to the target model. V-BAD is equivalent to estimating the projection of an adversarial gradient on a selected subspace. Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models. For the targeted attack, it achieves $>$93\% success rate using only an average of $3.4 \sim 8.4 \times 10^4$ queries, a similar number of queries to state-of-the-art black-box image attacks. This is despite the fact that videos often have two orders of magnitude higher dimensionality than static images. We believe that V-BAD is a promising new tool to evaluate and improve the robustness of video recognition models to black-box adversarial attacks.