AITopics | feedback generation

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Education > Curriculum > Subject-Specific Education (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Neural Information Processing SystemsFeb-11-2026, 00:26:40 GMT

34cc2ded6daba59357134c0b9fb06bfe-Paper-Datasets_and_Benchmarks_Track.pdf

buggy program, large language model, machine learning, (18 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Azurmendi, Ekhi, Arregi, Xabier, de Lacalle, Oier Lopez

Automatic Essay Scoring and Feedback Generation in Basque Language Learning

arXiv.org Artificial IntelligenceDec-10-2025

This paper introduces the first publicly available dataset for Automatic Essay Scoring (AES) and feedback generation in Basque, targeting the CEFR C1 proficiency level. The dataset comprises 3,200 essays from HABE, each annotated by expert evaluators with criterion specific scores covering correctness, richness, coherence, cohesion, and task alignment enriched with detailed feedback and error examples. We fine-tune open-source models, including RoBERTa-EusCrawl and Latxa 8B/70B, for both scoring and explanation generation. Our experiments show that encoder models remain highly reliable for AES, while supervised fine-tuning (SFT) of Latxa significantly enhances performance, surpassing state-of-the-art (SoTA) closed-source systems such as GPT-5 and Claude Sonnet 4.5 in scoring consistency and feedback quality. We also propose a novel evaluation methodology for assessing feedback generation, combining automatic consistency metrics with expert-based validation of extracted learner errors. Results demonstrate that the fine-tuned Latxa model produces criterion-aligned, pedagogically meaningful feedback and identifies a wider range of error types than proprietary models. This resource and benchmark establish a foundation for transparent, reproducible, and educationally grounded NLP research in low-resource languages such as Basque.

large language model, machine learning, natural language, (18 more...)

2512.08713

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Spain > Basque Country (0.04)
Europe > Faroe Islands > Streymoy > Tórshavn (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards > Student Performance (0.73)
Education > Curriculum > Subject-Specific Education (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Jordán, Joaquín, Yin, Xavier, Fabros, Melissa, Ranade, Gireeja, Norouzi, Narges

MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer

arXiv.org Artificial IntelligenceNov-20-2025

Automated Essay Scoring (AES) and Automatic Essay Feedback (AEF) systems aim to reduce the workload of human raters in educational assessment. However, most existing systems prioritize numerical scoring accuracy over feedback quality and are primarily evaluated on pre-secondary school level writing. This paper presents Multi-Agent Argumentation and Grammar Integrated Critiquer (MAGIC), a framework using five specialized agents to evaluate prompt adherence, persuasiveness, organization, vocabulary, and grammar for both holistic scoring and detailed feedback generation. To support evaluation at the college level, we collated a dataset of Graduate Record Examination (GRE) practice essays with expert-evaluated scores and feedback. MAGIC achieves substantial to near-perfect scoring agreement with humans on the GRE data, outperforming baseline LLM models while providing enhanced interpretability through its multi-agent approach. We also compare MAGIC's feedback generation capabilities against ground truth human feedback and baseline models, finding that MAGIC achieves strong feedback quality and naturalness.

large language model, machine learning, natural language, (21 more...)

2506.13037

Country:

Asia > Middle East > Jordan (0.40)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-15-2025

LearnLens: LLM-Enabled Personalised, Curriculum-Grounded Feedback with Educators in the Loop

Zhao, Runcong, Bobrov, Artem, Li, Jiazheng, Aloisi, Cesare, He, Yulan

Effective feedback is essential for student learning but is time-intensive for teachers. We present LearnLens, a modular, LLM-based system that generates personalised, curriculum-aligned feedback in science education. LearnLens comprises three components: (1) an error-aware assessment module that captures nuanced reasoning errors; (2) a curriculum-grounded generation module that uses a structured, topic-linked memory chain rather than traditional similarity-based retrieval, improving relevance and reducing noise; and (3) an educator-in-the-loop interface for customisation and oversight. LearnLens addresses key challenges in existing systems, offering scalable, high-quality feedback that empowers both teachers and students.

large language model, machine learning, natural language, (20 more...)

2507.04295

Country:

North America > United States > Texas > Kleberg County (0.04)
North America > United States > Texas > Chambers County (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-9-2025, 23:02:44 GMT

34cc2ded6daba59357134c0b9fb06bfe-Supplemental-Datasets_and_Benchmarks_Track.pdf

buggy program, dataset, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.69)
Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications (0.94)
Information Technology > Software (0.93)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 23:02:40 GMT

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation

buggy program, inference time, learner, (13 more...)

Country: Asia > Singapore (0.04)

Genre:

Research Report (0.68)
Workflow (0.49)

Industry:

Information Technology > Security & Privacy (0.48)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Educational Setting (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Rüdian, Sylvio, Elsir, Yassin, Kretschmer, Marvin, Cayrou, Sabine, Pinkwart, Niels

Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning

arXiv.org Artificial IntelligenceAug-18-2025

Automated feedback generation has the potential to enhance students' learning progress by providing timely and targeted feedback. Moreover, it can assist teachers in optimizing their time, allowing them to focus on more strategic and personalized aspects of teaching. To generate high-quality, information-rich formative feedback, it is essential first to extract relevant indicators, as these serve as the foundation upon which the feedback is constructed. Teachers often employ feedback criteria grids composed of various indicators that they evaluate systematically. This study examines the initial phase of extracting such indicators from students' submissions of a language learning course using the large language model Llama 3.1. Accordingly, the alignment between indicators generated by the LLM and human ratings across various feedback criteria is investigated. The findings demonstrate statistically significant strong correlations, even in cases involving unanticipated combinations of indicators and criteria. The methodology employed in this paper offers a promising foundation for extracting indicators from students' submissions using LLMs. Such indicators can potentially be utilized to auto-generate explainable and transparent formative feedback in future research.

large language model, machine learning, natural language, (17 more...)

2508.11364

Country:

Europe > Germany > Berlin (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > France > Île-de-France > Hauts-de-Seine > Nanterre (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting > Online (0.69)
Education > Assessment & Standards > Assessment Methods (0.69)
Education > Curriculum > Subject-Specific Education (0.51)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceAug-15-2025

FEAT: A Preference Feedback Dataset through a Cost-Effective Auto-Generation and Labeling Framework for English AI Tutoring

Seo, Hyein, Hwang, Taewook, Lee, Yohan, Jung, sangkeun

In English education tutoring, teacher feedback is essential for guiding students. Recently, AI-based tutoring systems have emerged to assist teachers; however, these systems require high-quality and large-scale teacher feedback data, which is both time-consuming and costly to generate manually. In this study, we propose FEAT, a cost-effective framework for generating teacher feedback, and have constructed three complementary datasets: (1) DIRECT-Manual (DM), where both humans and large language models (LLMs) collaboratively generate high-quality teacher feedback, albeit at a higher cost; (2) DIRECT-Generated (DG), an LLM-only generated, cost-effective dataset with lower quality;, and (3) DIRECT-Augmented (DA), primarily based on DG with a small portion of DM added to enhance quality while maintaining cost-efficiency. Experimental results showed that incorporating a small portion of DM (5-10%) into DG leads to superior performance compared to using 100% DM alone.

criteria, large language model, machine learning, (18 more...)

doi: 10.18653/v1/2025.acl-short.45

2506.19325

Country:

North America > Mexico > Mexico City > Mexico City (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-12-2025

Annotating Errors in English Learners' Written Language Production: Advancing Automated Written Feedback Systems

Coyne, Steven, Galvan-Sosa, Diana, Spring, Ryan, Guerraoui, Camélia, Zock, Michael, Sakaguchi, Keisuke, Inui, Kentaro

Recent advances in natural language processing (NLP) have contributed to the development of automated writing evaluation (AWE) systems that can correct grammatical errors. However, while these systems are effective at improving text, they are not optimally designed for language learning. They favor direct revisions, often with a click-to-fix functionality that can be applied without considering the reason for the correction. Meanwhile, depending on the error type, learners may benefit most from simple explanations and strategically indirect hints, especially on generalizable grammatical rules. To support the generation of such feedback, we introduce an annotation framework that models each error's error type and generalizability. For error type classification, we introduce a typology focused on inferring learners' knowledge gaps by connecting their errors to specific grammatical patterns. Following this framework, we collect a dataset of annotated learner errors and corresponding human-written feedback comments, each labeled as a direct correction or hint. With this data, we evaluate keyword-guided, keyword-free, and template-guided methods of generating feedback using large language models (LLMs). Human teachers examined each system's outputs, assessing them on grounds including relevance, factuality, and comprehensibility. We report on the development of the dataset and the comparative performance of the systems investigated.

computational linguistic, large language model, natural language, (17 more...)

doi: 10.1007/978-3-031-98459-4_21

2508.0681

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)