A Rate-Distortion Framework for Explaining Black-box Model Decisions
Kolek, Stefan, Nguyen, Duc Anh, Levie, Ron, Bruna, Joan, Kutyniok, Gitta
–arXiv.org Artificial Intelligence
Powerful machine learning models such as deep neural networks are inherently opaque, which has motivated numerous explanation methods that the research community developed over the last decade [1, 24, 26, 20, 15, 16, 7, 2]. The meaning and validity of an explanation depends on the underlying principle of the explanation framework. Therefore, a trustworthy explanation framework must align intuition with mathematical rigor while maintaining maximal flexibility and applicability. We believe the Rate-Distortion Explanation (RDE) framework, first proposed by [16], then extended by [9], as well as the similar framework in [2], meets the desired qualities. In this chapter, we aim to present the RDE framework in a revised and holistic manner. Our generalized RDE framework can be applied to any model (not just classification tasks), supports in-distribution interpretability (by leveraging in-painting GANs), and admits interpretation queries (by considering suitable input signal representations).
arXiv.org Artificial Intelligence
Oct-12-2021
- Country:
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States
- New York (0.04)
- Europe > Germany
- Genre:
- Research Report (0.64)
- Industry:
- Transportation > Air (0.50)
- Technology: