Towards Robust Interpretability with Self-Explaining Neural Networks

Dec-31-2018–Neural Information Processing Systems

Most recent work on interpretability of complex machine learning models has focused on estimating a-posteriori explanations for previously trained models around specific predictions. Self-explaining models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.

artificial intelligence, explanation, machine learning, (17 more...)

Neural Information Processing Systems

Dec-31-2018

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Industry:
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning (0.88)

Duplicate Docs Excel Report

Title
Towards Robust Interpretability with Self-Explaining Neural Networks
Towards Robust Interpretability with Self-Explaining Neural Networks

Similar Docs Excel Report more

Title	Similarity	Source
None found