A Benchmark for Interpretability Methods in Deep Neural Networks
–Neural Information Processing Systems
We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance.
Neural Information Processing Systems
Dec-26-2025, 04:37:25 GMT
- Technology: