The Logic Traps in Evaluating Post-hoc Interpretations
Ju, Yiming, Zhang, Yuanzhe, Yang, Zhao, Jiang, Zhongtao, Liu, Kang, Zhao, Jun
–arXiv.org Artificial Intelligence
The lack of a clear statement on these traps has damaged the The inscrutability of deep models has grown in tandem community in the following aspects: with their power (Doshi-Velez and Kim, 2017), which has motivated efforts to interpret how these First, different evaluation methods may give rise black-box models work (Sundararajan et al., 2017; to contradictory conclusions, which has caused Belinkov and Glass, 2019). Post-hoc interpretation many debates, such as the argument in using the aims to explain a trained model and reveal how magnitudes of attention weights as interpretations the model arrives at a decision. This interpretability for transformer-based models (Wiegreffe and Pinter, is achieved by interpreting a trained model in 2019; Jain and Wallace, 2019; Pruthi et al., post-hoc ways (Molnar, 2020).
arXiv.org Artificial Intelligence
Sep-12-2021