Challenges in Explanation Quality Evaluation
Schuff, Hendrik, Adel, Heike, Qi, Peng, Vu, Ngoc Thang
–arXiv.org Artificial Intelligence
While much research focused on producing explanations, it is still unclear how the produced explanations' quality can be evaluated in a meaningful way. Today's predominant approach is to quantify explanations using proxy scores which compare explanations to (human-annotated) gold explanations. This approach assumes that explanations which reach higher proxy scores will also provide a greater benefit to human users. In this paper, we present problems of this approach. Concretely, we (i) formulate desired characteristics of explanation quality, (ii) describe how current evaluation practices violate them, and (iii) support our argumentation with initial evidence from a crowdsourcing case study in which we investigate the explanation quality of state-of-the-art explainable question answering systems. We find that proxy scores correlate poorly with human quality ratings and, additionally, become less expressive the more often they are used (i.e. following Goodhart's law). Finally, we propose guidelines to enable a meaningful evaluation of explanations to drive the development of systems that provide tangible benefits to human users.
arXiv.org Artificial Intelligence
Mar-9-2023
- Country:
- Africa
- Botswana > Kalahari Desert (0.04)
- Namibia > Kalahari Desert (0.04)
- South Africa > Kalahari Desert (0.04)
- Southern Africa (0.04)
- Asia
- China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Japan > Honshū
- Kantō
- Kanagawa Prefecture > Yokohama (0.04)
- Tokyo Metropolis Prefecture > Tokyo (0.14)
- Kantō
- Malaysia (0.04)
- China
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany
- Baden-Württemberg > Stuttgart Region
- Stuttgart (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Baden-Württemberg > Stuttgart Region
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy
- Apulia > Bari (0.04)
- Sardinia > Cagliari (0.04)
- Trentino-Alto Adige/Südtirol > Trentino Province
- Trento (0.04)
- Tuscany > Florence (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > Scotland
- City of Glasgow > Glasgow (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Quebec
- Montreal (0.04)
- Dominican Republic (0.04)
- United States
- New York > New York County
- New York City (0.04)
- California
- Los Angeles County
- Long Beach (0.04)
- Los Angeles (0.04)
- San Francisco County > San Francisco (0.14)
- Santa Clara County > Palo Alto (0.04)
- Los Angeles County
- Missouri > St. Louis County
- St. Louis (0.04)
- Washington > King County
- Seattle (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Texas
- Brazos County > College Station (0.04)
- Dallas County > Dallas (0.04)
- Travis County > Austin (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York > New York County
- Canada > Quebec
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Africa
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Education (1.00)
- Health & Medicine (1.00)
- Technology: