Evaluating and Characterizing Human Rationales

Carton, Samuel, Rathore, Anirudh, Tan, Chenhao

Oct-9-2020–arXiv.org Artificial Intelligence

Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using "fidelity curves" to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

deep learning, neural network, rationale, (21 more...)

arXiv.org Artificial Intelligence

Oct-9-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Colorado (0.14)

Genre:
- Research Report > New Finding (0.69)

Industry:
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found