Deep networks are well-known to be fragile to adversarial attacks. Using several standard image datasets and established attack mechanisms, we conduct an empirical analysis of deep representations under attack, and find that the attack causes the internal representation to shift closer to the "false" class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning in order to produce more robust classifiers. By carefully sampling examples for metric learning, our learned representation not only increases robustness, but also can detect previously unseen adversarial samples. Quantitative experiments show improvement of robustness accuracy by up to 4\% and detection efficiency by up to 6\% according to Area Under Curve (AUC) score over baselines.
This package implements the experiments described in the paper Countering Adversarial Images Using Input Transformations. It contains implementations for adversarial attacks, defenses based image transformations, training, and testing convolutional networks under adversarial attacks using our defenses. We also provide pre-trained models.
Adversarial examples are intentionally perturbed images that mislead classifiers. These images can, however, be easily detected using denoising algorithms, when high-frequency spatial perturbations are used, or can be noticed by humans, when perturbations are large. In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. EdgeFool generates adversarial images with perturbations that enhance image details via training a fully convolutional neural network end-to-end with a multi-task loss function. This loss function accounts for both image detail enhancement and class misleading objectives. We evaluate EdgeFool on three classifiers (ResNet-50, ResNet-18 and AlexNet) using two datasets (ImageNet and Private-Places365) and compare it with six adversarial methods (DeepFool, SparseFool, Carlini-Wagner, SemanticAdv, Non-targeted and Private Fast Gradient Sign Methods).
Today's state-of-the-art image classifiers fail to correctly classify carefully manipulated adversarial images. In this work, we develop a new, localized adversarial attack that generates adversarial examples by imperceptibly altering the backgrounds of normal images. We first use this attack to highlight the unnecessary sensitivity of neural networks to changes in the background of an image, then use it as part of a new training technique: localized adversarial training. By including locally adversarial images in the training set, we are able to create a classifier that suffers less loss than a non-adversarially trained counterpart model on both natural and adversarial inputs. The evaluation of our localized adversarial training algorithm on MNIST and CIFAR-10 datasets shows decreased accuracy loss on natural images, and increased robustness against adversarial inputs.
As intelligent as AI is becoming, it's still not clever enough to fool the most determined of hackers. Just last year, researchers tricked a commercial facial recognition system into thinking they were someone they weren't just by wearing a pair of patterned glasses. It was simply a sticker with a hallucinogenic print on it, but to the AI it was so much more. Because of the twists and curves of the pattern to the computer the glasses resembled someone's face, and by altering the patterns, the researchers could choose any face they wanted and that's what the AI saw. This type of cyber security is a relatively new form and has been given the name of "adversarial machine learning".