What do we expect from Multiple-choice QA Systems?

Nov-20-2020–arXiv.org Artificial Intelligence

The recent success of machine learning systems on various QA datasets could be interpreted as a significant improvement in models' language understanding abilities. However, using various perturbations, multiple recent works have shown that good performance on a dataset might not indicate performance that correlates well with human's expectations from models that "understand" language. In this work we consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets, and evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs. Our results show that the model clearly falls short of our expectations, and motivates a modified training approach that forces the model to better attend to the inputs. We show that the new training paradigm leads to a model that performs on par with the original model while better satisfying our expectations.

artificial intelligence, dataset, us government, (16 more...)

arXiv.org Artificial Intelligence

Nov-20-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Louisiana (0.14)

Genre:
- Research Report > New Finding (0.69)

Industry:
- Education (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found