Fairwashing Explanations with Off-Manifold Detergent

Anders, Christopher J., Pasliev, Plamen, Dombrowski, Ann-Kathrin, Müller, Klaus-Robert, Kessel, Pan

Jul-20-2020–arXiv.org Machine Learning

Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$, one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.

explanation, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Jul-20-2020

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Alberta > Census Division No. 15
      - Improvement District No. 9 > Banff (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Spain > Andalusia
    - Granada Province > Granada (0.04)
  - Italy > Marche
    - Ancona Province > Ancona (0.04)
  - Germany > Saarland
    - Saarbrücken (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.93)
  - Representation & Reasoning (0.67)
  - Machine Learning
    - Neural Networks > Deep Learning (0.93)
    - Statistical Learning > Regression (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found