AITopics | corrective feedback

This study investigates the potential for Large Language Models (LLMs) to scale-up Dynamic Assessment (DA). To facilitate such an investigation, we first developed DynaWrite-a modular, microservices-based grammatical tutoring application which supports multiple LLMs to generate dynamic feedback to learners of English. Initial testing of 21 LLMs, revealed GPT-4o and neural chat to have the most potential to scale-up DA in the language learning classroom. Further testing of these two candidates found both models performed similarly in their ability to accurately identify grammatical errors in user sentences. However, GPT-4o consistently outperformed neural chat in the quality of its DA by generating clear, consistent, and progressively explicit hints. Real-time responsiveness and system stability were also confirmed through detailed performance testing, with GPT-4o exhibiting sufficient speed and stability. This study shows that LLMs can be used to scale-up dynamic assessment and thus enable dynamic assessment to be delivered to larger groups than possible in traditional teacher-learner settings.

large language model, learner, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2025.3603191

2505.00931

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.49)
Education > Assessment & Standards > Student Performance (0.46)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 16:45:26 GMT

algorithm, discor, function approximation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

version of our paper, we shall clarify the details in Section 3 (R2), and make intuition in the methods section much

Neural Information Processing SystemsAug-16-2025, 16:45:14 GMT

We thank the reviewers for the detailed comments, suggestions, and a positive assessment of our work. We will correct for color schemes in all figures (R1). We have also made captions of figures cleaner (R3). We have added a description of the setup to the paper. In Fig 5 (left), DisCor actually outperforms Unif( s,a) on these environments.

function approximation, make intuition, on-policy distribution, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

Annotating Errors in English Learners' Written Language Production: Advancing Automated Written Feedback Systems

Coyne, Steven, Galvan-Sosa, Diana, Spring, Ryan, Guerraoui, Camélia, Zock, Michael, Sakaguchi, Keisuke, Inui, Kentaro

arXiv.org Artificial IntelligenceAug-12-2025

Recent advances in natural language processing (NLP) have contributed to the development of automated writing evaluation (AWE) systems that can correct grammatical errors. However, while these systems are effective at improving text, they are not optimally designed for language learning. They favor direct revisions, often with a click-to-fix functionality that can be applied without considering the reason for the correction. Meanwhile, depending on the error type, learners may benefit most from simple explanations and strategically indirect hints, especially on generalizable grammatical rules. To support the generation of such feedback, we introduce an annotation framework that models each error's error type and generalizability. For error type classification, we introduce a typology focused on inferring learners' knowledge gaps by connecting their errors to specific grammatical patterns. Following this framework, we collect a dataset of annotated learner errors and corresponding human-written feedback comments, each labeled as a direct correction or hint. With this data, we evaluate keyword-guided, keyword-free, and template-guided methods of generating feedback using large language models (LLMs). Human teachers examined each system's outputs, assessing them on grounds including relevance, factuality, and comprehensibility. We report on the development of the dataset and the comparative performance of the systems investigated.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-98459-4_21

2508.0681

Country: