Regularizing Black-box Models for Improved Interpretability

Plumb, Gregory, Al-Shedivat, Maruan, Xing, Eric, Talwalkar, Ameet

Feb-18-2019–arXiv.org Machine Learning

Most work on interpretability in machine learning hasfocused on designing either inherently interpretable models, that typically tradeoff interpretability foraccuracy, or post-hoc explanation systems, that lack guarantees about their explanation quality.We propose an alternative to these approaches by directly regularizing a black-box model for interpretability at training time. Our approach explicitlyconnects three key aspects of interpretable machinelearning: the model's innate explainability, the explanation system used at test time, and the metrics that measure explanation quality. Our regularization results in substantial (up to orders of magnitude) improvement in terms of explanation fidelity and stability metrics across a range of datasets, models, and black-box explanation systems.Remarkably, our regularizers also slightly improve predictive accuracy on average across the nine datasets we consider. Further, we show that the benefits of our novel regularizers on explanation quality provably generalize to unseen test points.

air transportation, explanation, neural network, (22 more...)

arXiv.org Machine Learning

Feb-18-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Transportation > Air (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning (0.68)
  - Natural Language > Explanation & Argumentation (0.89)
  - Representation & Reasoning > Expert Systems (0.75)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found