Reviews: Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

Neural Information Processing Systems 

This paper introduces an interesting experiment that tries to show that adversarial examples that transfer across recently proposed deep learning models can influence the visual classification produced by time-limited humans (at most 2.2 to 2.5 seconds for looking at the image and making the classification). The idea is to take a set of 38 observers to classify images into 2 classes from within each of three separate groups (pets - cat or dog, hazard - snake or spider, vegetables - broccoli or cabbage). The images are presented in different ways: 1) original image, 2) adversarial image manipulated to be classified by an ensemble classification method (of 10 recently proposed deep learning models), 3) flip, which is the original image manipulated by a vertically flipped adversarial perturbation (this is a control case to show that low signal-to-noise ratio alone does not explain poor human classification), and 4) false, which is an image outside the three groups above, manipulated by an adversarial perturbation to be classified as one of the classes within one of the three groups. In general, results show that 1) adversarial perturbations in the false class successfully biased human classification towards the target class; and 2) adversarial perturbations cause observers to select the incorrect class even when the correct class is available (this is the 2nd manipulation described above). I found this paper quite interesting to read and I support its publication.