Evaluating LLM-Generated Q&A Test: a Student-Centered Study
Wróblewska, Anna, Grabek, Bartosz, Świstak, Jakub, Dan, Daniel
–arXiv.org Artificial Intelligence
This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o-mini-based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.
arXiv.org Artificial Intelligence
Aug-8-2025
- Country:
- Europe
- Austria > Vienna (0.14)
- Poland > Masovia Province
- Warsaw (0.04)
- Switzerland (0.04)
- North America > United States (0.04)
- Europe
- Genre:
- Research Report
- Experimental Study (0.69)
- New Finding (0.89)
- Research Report
- Industry:
- Education (1.00)
- Information Technology > Security & Privacy (0.46)
- Technology: