Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

Schwarzschild, Avi, Cembalest, Max, Rao, Karthik, Hines, Keegan, Dickerson, John

Mar-23-2023–arXiv.org Artificial Intelligence

As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner is a necessity. One commonly used type of explainer is post hoc feature attribution, a family of methods for giving each feature in an input a score corresponding to its influence on a model's output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by introducing a Post hoc Explainer Agreement Regularization (PEAR) loss term alongside the standard term corresponding to accuracy, an additional term that measures the difference in feature attribution between a pair of explainers. We observe on three datasets that we can train a model with this loss term to improve explanation consensus on unseen data, and see improved consensus between explainers other than those used in the loss term. We examine the trade-off between improved consensus and model performance. And finally, we study the influence our method has on feature attribution explanations.

artificial intelligence, explainer, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Mar-23-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.05)
  - New York
    - Richmond County > New York City (0.04)
    - Queens County > New York City (0.04)
    - New York County > New York City (0.04)
    - Kings County > New York City (0.04)
    - Bronx County > New York City (0.04)
  - Maryland > Prince George's County
    - College Park (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found