DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

Apr-10-2023–arXiv.org Artificial Intelligence

Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of stateof-the-art Figure 1: A multi-choice question example from the DG models very differently from RACE dataset (Lai et al., 2017). The generated distractors MT-based metrics, showing that MT metrics were produced using a T5 model. Though the should not be used for distractor evaluation.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Apr-10-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - France (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Asia > Middle East
  - Israel (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education > Assessment & Standards > Student Performance (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Machine Translation (0.89)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found