Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Open in new window