Review for NeurIPS paper: Debugging Tests for Model Explanations

Neural Information Processing Systems 

Weaknesses: Although I think the paper looked into an important question, I feel like the negative results from the user study largely confirm known issues of the attribution methods and previous results on evaluating interpretation methods. For example, the observation that in a cooperative setting, humans largely rely on model prediction while ignoring explanations is described in many HCI papers including but not limited to "On human predictions with explanations and predictions of machine learning models: A case study on deception detection" by Lai & Tan (FAT* 2019). Many of the empirical assessments are also done in previous papers. I'm having a hard time figuring out what new value this paper provides. The authors consider the bug categorization one of the contributions.