KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations

Jang, Myeongjun, Majumder, Bodhisattwa Prasad, McAuley, Julian, Lukasiewicz, Thomas, Camburu, Oana-Maria

Jun-5-2023–arXiv.org Artificial Intelligence

While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and alleviating inconsistencies among generated NLEs. In this work, we leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs. We apply our attack to high-performing NLE models and show that models with higher NLE quality do not necessarily generate fewer inconsistencies. Moreover, we propose an off-the-shelf mitigation method to alleviate inconsistencies by grounding the model into external background knowledge. Our method decreases the inconsistencies of previous high-performing NLE models as detected by our attack.

machine learning, natural language, xplanation, (20 more...)

arXiv.org Artificial Intelligence

Jun-5-2023

arXiv.org PDF

Add feedback

Country:
- Atlantic Ocean (0.04)
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Europe
  - Austria > Vienna (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.14)
    - Greater London > London (0.04)

Genre:
- Research Report (1.00)

Industry:
- Transportation (0.69)
- Government (0.50)
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning > Expert Systems (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found