LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing
Wang, Zhengxiang, Makarova, Veronika, Li, Zhi, Kodner, Jordan, Rambow, Owen
–arXiv.org Artificial Intelligence
The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.
arXiv.org Artificial Intelligence
Feb-16-2025
- Country:
- Asia (0.67)
- Europe > United Kingdom
- England (0.28)
- North America > United States (1.00)
- Genre:
- Instructional Material (0.93)
- Overview (1.00)
- Research Report
- Experimental Study (0.67)
- New Finding (1.00)
- Industry:
- Technology: