D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model

Jun-16-2025–arXiv.org Artificial Intelligence

Evaluating generative models with open-ended generation is challenging due to inconsistencies in response formats. Multiple-choice (MC) evaluation mitigates this issue, but generating high-quality distractors is time-consuming and labor-intensive. We introduce D-GEN, the first open-source distractor generator model that transforms open-ended data into an MC format. To evaluate distractor quality, we propose two novel methods: (1) ranking alignment, ensuring generated distractors retain the discriminatory power of ground-truth distractors, and (2) entropy analysis, comparing model confidence distributions. Our results show that D-GEN preserves ranking consistency (Spearman's rho 0.99, Kendall's tau 0.94) and closely matches the entropy distribution of ground-truth distractors. Human evaluation further confirms the fluency, coherence, distractiveness, and incorrectness. Our work advances robust and efficient distractor generation with automated evaluation, setting a new standard for MC evaluation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-16-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Europe (1.00)
- Africa (1.00)
- Asia > Middle East
  - Republic of Türkiye (0.67)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Information Technology (0.93)
- Health & Medicine
  - Epidemiology (0.68)
  - Pharmaceuticals & Biotechnology (0.67)
  - Therapeutic Area
    - Infections and Infectious Diseases (1.00)
    - Immunology (1.00)
    - Pulmonary/Respiratory Diseases (0.67)
- Government
  - Military (0.92)
  - Regional Government > North America Government
    - United States Government (1.00)
- Education > Curriculum
  - Subject-Specific Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.94)
    - Generation (0.84)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)