A Benchmark for Interpretability Methods in Deep Neural Networks

Dec-26-2025, 04:37:25 GMT–Neural Information Processing Systems

We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance.

deep neural network, interpretability method, name change, (3 more...)

Neural Information Processing Systems

Dec-26-2025, 04:37:25 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)