Reviews: A Benchmark for Interpretability Methods in Deep Neural Networks
–Neural Information Processing Systems
Summary --- This paper proposes to evaluate saliency/importance visual explanations by removing "important" pixels and measuring whether a re-trained classifier can still classify such images correctly. Many explanations fail to remove such class-relevant information, but some ensembling techniques succeed by completely removing objects. Those are said to be better explanations. This paper takes the view that important information is that information which a classifier can use to predict the correct label. As a result, we can measure whether an importance estimate is good by measuring how much performance drops when the important pixels are removed from all images in both train and val sets.
Neural Information Processing Systems
Feb-5-2025, 22:06:18 GMT
- Technology: