The Logic Traps in Evaluating Post-hoc Interpretations

Ju, Yiming, Zhang, Yuanzhe, Yang, Zhao, Jiang, Zhongtao, Liu, Kang, Zhao, Jun

Sep-12-2021–arXiv.org Artificial Intelligence

The lack of a clear statement on these traps has damaged the The inscrutability of deep models has grown in tandem community in the following aspects: with their power (Doshi-Velez and Kim, 2017), which has motivated efforts to interpret how these First, different evaluation methods may give rise black-box models work (Sundararajan et al., 2017; to contradictory conclusions, which has caused Belinkov and Glass, 2019). Post-hoc interpretation many debates, such as the argument in using the aims to explain a trained model and reveal how magnitudes of attention weights as interpretations the model arrives at a decision. This interpretability for transformer-based models (Wiegreffe and Pinter, is achieved by interpreting a trained model in 2019; Jain and Wallace, 2019; Pruthi et al., post-hoc ways (Molnar, 2020).

deep learning, interpretation, neural network, (18 more...)

arXiv.org Artificial Intelligence

Sep-12-2021

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.15)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language (1.00)