Seven Myths in Machine Learning Research
While neural networks are commonly believed to be black boxes, there have been many, many attempts made to interpret them. Saliency maps, or other similar methods that assign importance scores to features or training examples, are the most popular form of interpretation. It is tempting to be able to conclude that the reason why a given image is classified a certain way is due to particular parts of the image that are salient to the neural network's decision in making the classification. There are several ways to compute this saliency map, often making use of a neural network's activations on a given image and the gradients that flow through the network. In Ghorbani et al. [2017], the authors show that they can introduce an imperceptible perturbation to a given image to distort its saliency map. A monarch butterfly is thus classified as a monarch butterfly, not on account of the patterns on its wings, but because of some unimportant green leaves in the background.
Feb-18-2019