Assessing Distractors in Multiple-Choice Tests
Raina, Vatsal, Liusie, Adian, Gales, Mark
–arXiv.org Artificial Intelligence
Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. However, generating good quality distractors satisfying these criteria is a challenging task for content creators. We propose automated assessment metrics for the quality of distractors in multiple-choice reading comprehension tests. Specifically, we define quality in terms of the incorrectness, plausibility and diversity of the distractor options. We assess incorrectness using the classification ability of a binary multiple-choice reading comprehension system. Plausibility is assessed by considering the distractor confidence - the probability mass associated with the distractor options for a standard multi-class multiple-choice reading comprehension system. Diversity is assessed by pairwise comparison of an embedding-based equivalence metric between the distractors of a question. To further validate the plausibility metric we compare against candidate distributions over multiple-choice questions and agreement with a ChatGPT model's interpretation of distractor plausibility and diversity.
arXiv.org Artificial Intelligence
Nov-8-2023
- Country:
- Asia > Japan
- Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Galicia
- Madrid (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.15)
- Croatia > Dubrovnik-Neretva County
- North America > United States
- Michigan (0.04)
- Pennsylvania (0.04)
- South America > Chile
- Asia > Japan
- Genre:
- Questionnaire & Opinion Survey (0.76)
- Research Report (0.50)
- Industry:
- Education > Assessment & Standards > Student Performance (0.99)
- Technology: