A Rate-Distortion Framework for Explaining Black-box Model Decisions

Kolek, Stefan, Nguyen, Duc Anh, Levie, Ron, Bruna, Joan, Kutyniok, Gitta

Oct-12-2021–arXiv.org Artificial Intelligence

Powerful machine learning models such as deep neural networks are inherently opaque, which has motivated numerous explanation methods that the research community developed over the last decade [1, 24, 26, 20, 15, 16, 7, 2]. The meaning and validity of an explanation depends on the underlying principle of the explanation framework. Therefore, a trustworthy explanation framework must align intuition with mathematical rigor while maintaining maximal flexibility and applicability. We believe the Rate-Distortion Explanation (RDE) framework, first proposed by [16], then extended by [9], as well as the similar framework in [2], meets the desired qualities. In this chapter, we aim to present the RDE framework in a revised and holistic manner. Our generalized RDE framework can be applied to any model (not just classification tasks), supports in-distribution interpretability (by leveraging in-painting GANs), and admits interpretation queries (by considering suitable input signal representations).

distortion, explanation, representation, (14 more...)

arXiv.org Artificial Intelligence

Oct-12-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)

Genre:
- Research Report (0.64)

Industry:
- Transportation > Air (0.50)

Technology:
- Information Technology
  - Data Science > Data Quality
    - Data Transformation (0.96)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Statistical Learning (0.93)
      - Neural Networks > Deep Learning (0.66)