Goto

Collaborating Authors

 lime-g


Removing input features via a generative model to explain their attributions to classifier's decisions

arXiv.org Machine Learning

Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Compared to the original counterparts, our methods (1) generate more plausible counterfactual samples under the true data generating process; (2) are more robust to hyperparameter settings; and (3) localize objects more accurately. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters. Explaining a classifier's outputs given a certain input is increasingly important, especially for life-critical applications (Doshi-V elez & Kim, 2017). A popular means for visually explaining an image classifier's decisions is an attribution map i.e. a heatmap that highlights the input pixels that are the evidence for and against the classification outputs (Montavon et al., 2018). To construct an attribution map, many methods approximate the attribution value of an input region by the classification probability change when that region is absent i.e. removed from the image. That is, most perturbation-based attribution methods implement the absence of an input feature by replacing it with (a) mean pixels; (b) random noise; or (c) blurred versions of the original content. While removing an input feature to measure its attribution is a principle method in causal reasoning, the existing removal (i.e. To combat these two issues, we propose to harness a state-of-the-art generative inpainting model (hereafter, an inpainter) to remove features from an input image and fill in with content that is plausible under the true data distribution. We test our approach on three representative attribution methods of Sliding-Patch (SP) (Zeiler & Fergus, 2014), LIME (Ribeiro et al., 2016), and Meaningful-Perturbation (MP) (Fong & V edaldi, 2017) across two large-scale datasets of ImageNet (Russakovsky et al., 2015) and Places365 (Zhou et al., 2017). For each dataset, we use a separate pair of pre-trained image classifiers and inpainters. Work done during CA's internship at Auburn University.