AITopics | formative assessment

Collaborating Authors

formative assessment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents

Cohn, Clayton, Rayala, Surya, Srivastava, Namrata, Fonteles, Joyce Horn, Jain, Shruti, Luo, Xinying, Mereddy, Divya, Mohammed, Naveeduddin, Biswas, Gautam

arXiv.org Artificial IntelligenceNov-11-2025

Large language models (LLMs) present new opportunities for creating pedagogical agents that engage in meaningful dialogue to support student learning. However, the current use of LLM systems like ChatGPT in classrooms often lacks the solid theoretical foundation found in earlier intelligent tutoring systems. To bridge this gap, we propose a framework that combines Evidence-Centered Design with Social Cognitive Theory for adaptive scaffolding in LLM-based agents focused on STEM+C learning. We illustrate this framework with In-quizzitor, an LLM-based formative assessment agent that integrates human-AI hybrid intelligence and provides feedback grounded in cognitive science principles. Our findings show that Inquizzitor delivers high-quality assessment and interaction aligned with core learning theories, offering teachers effective guidance that students value. This research underscores the potential for theory-driven LLM integration in education, highlighting the ability of these systems to provide adaptive and principled instruction.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.01503

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI-driven formative assessment and adaptive learning in data-science education: Evaluating an LLM-powered virtual teaching assistant

Anaroua, Fadjimata I, Li, Qing, Tang, Yan, Liu, Hong P.

arXiv.org Artificial IntelligenceSep-26-2025

This paper presents VITA (Virtual Teaching Assistants), an adaptive distributed learning (ADL) platform that embeds a large language model (LLM)-powered chatbot (BotCaptain) to provide dialogic support, interoperable analytics, and integrity-aware assessment for workforce preparation in data science. The platform couples context-aware conversational tutoring with formative-assessment patterns designed to promote reflective reasoning. The paper describes an end-to-end data pipeline that transforms chat logs into Experience API (xAPI) statements, instructor dashboards that surface outliers for just-in-time intervention, and an adaptive pathway engine that routes learners among progression, reinforcement, and remediation content. The paper also benchmarks VITA conceptually against emerging tutoring architectures, including retrieval-augmented generation (RAG)--based assistants and Learning Tools Interoperability (LTI)--integrated hubs, highlighting trade-offs among content grounding, interoperability, and deployment complexity. Contributions include a reusable architecture for interoperable conversational analytics, a catalog of patterns for integrity-preserving formative assessment, and a practical blueprint for integrating adaptive pathways into data-science courses. The paper concludes with implementation lessons and a roadmap (RAG integration, hallucination mitigation, and LTI~1.3 / OpenID Connect) to guide multi-course evaluations and broader adoption. In light of growing demand and scalability constraints in traditional instruction, the approach illustrates how conversational AI can support engagement, timely feedback, and personalized learning at scale. Future work will refine the platform's adaptive intelligence and examine applicability across varied educational settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.20369

Country:

North America > United States > Florida > Volusia County > Daytona Beach (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring

Cohn, Clayton, S, Ashwin T, Mohammed, Naveeduddin, Biswas, Gautam

arXiv.org Artificial IntelligenceAug-15-2025

Large language models (LLMs) have created new opportunities to assist teachers and support student learning. While researchers have explored various prompt engineering approaches in educational contexts, the degree to which these approaches generalize across domains--such as science, computing, and engineering--remains underexplored. In this paper, we introduce Chain-of-Thought Prompting + Active Learning (CoTAL), an LLM-based approach to formative assessment scoring that (1) leverages Evidence-Centered Design (ECD) to align assessments and rubrics with curriculum goals, (2) applies human-in-the-loop prompt engineering to automate response scoring, and (3) incorporates chain-of-thought (CoT) prompting and teacher and student feedback to iteratively refine questions, rubrics, and LLM prompts. Our findings demonstrate that CoTAL improves GPT-4's scoring performance across domains, achieving gains of up to 38.9% over a non-prompt-engineered baseline (i.e., without labeled examples, chain-of-thought prompting, or iterative refinement). Teachers and students judge CoTAL to be effective at scoring and explaining responses, and their feedback produces valuable insights that enhance grading accuracy and explanation quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.02323

Country:

North America > United States (0.28)
Europe > Montenegro (0.04)
Europe > Czechia > Prague (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Impact of AI on Educational Assessment: A Framework for Constructive Alignment

Stokkink, Patrick

arXiv.org Artificial IntelligenceJul-2-2025

The influence of Artificial Intelligence (AI), and specifically Large Language Models (LLM), on education is continuously increasing. These models are frequently used by students, giving rise to the question whether current forms of assessment are still a valid way to evaluate student performance and comprehension. The theoretical framework developed in this paper is grounded in Constructive Alignment (CA) theory and Bloom's taxonomy for defining learning objectives. We argue that AI influences learning objectives of different Bloom levels in a different way, and assessment has to be adopted accordingly. Furthermore, in line with Bloom's vision, formative and summative assessment should be aligned on whether the use of AI is permitted or not. Although lecturers tend to agree that education and assessment need to be adapted to the presence of AI, a strong bias exists on the extent to which lecturers want to allow for AI in assessment. This bias is caused by a lecturer's familiarity with AI and specifically whether they use it themselves. To avoid this bias, we propose structured guidelines on a university or faculty level, to foster alignment among the staff. Besides that, we argue that teaching staff should be trained on the capabilities and limitations of AI tools. In this way, they are better able to adapt their assessment methods.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.23815

Country:

Europe > Netherlands > South Holland > Delft (0.05)
North America > United States > Oregon (0.04)
North America > United States > New York (0.04)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.48)

Industry:

Education > Educational Setting > Higher Education (0.47)
Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy

Henkel, Owen, Horne-Robinson, Hannah, Dyshel, Maria, Ch, Nabil, Moreau-Pernet, Baptiste, Abood, Ralph

arXiv.org Artificial IntelligenceSep-26-2024

This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a learning platform used by students in several African countries and conducts two experiments to evaluate the use of large language models (LLM) for grading particularly challenging student answers. The AMMORE dataset enables various potential analyses and provides an important resource for researching student math acquisition in understudied, real-world, educational contexts. In experiment 1 we use a variety of LLM-driven approaches, including zero-shot, few-shot, and chain-of-thought prompting, to grade the 1% of student answers that a rule-based classifier fails to grade accurately. We find that the best-performing approach -- chain-of-thought prompting -- accurately scored 92% of these edge cases, effectively boosting the overall accuracy of the grading from 98.7% to 99.9%. In experiment 2, we aim to better understand the consequential validity of the improved grading accuracy, by passing grades generated by the best-performing LLM-based approach to a Bayesian Knowledge Tracing (BKT) model, which estimated student mastery of specific lessons. We find that relatively modest improvements in model accuracy at the individual question level can lead to significant changes in the estimation of student mastery. Where the rules-based classifier currently used to grade student, answers misclassified the mastery status of 6.9% of students across their completed lessons, using the LLM chain-of-thought approach this misclassification rate was reduced to 2.6% of students. Taken together, these findings suggest that LLMs could be a valuable tool for grading open-response questions in K-12 mathematics education, potentially enabling encouraging wider adoption of open-ended questions in formative assessment.

dataset, evaluation, student, (15 more...)

arXiv.org Artificial Intelligence

2409.17904

Country:

Africa > West Africa (0.04)
Africa > Ghana (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Cohn, Clayton, Hutchins, Nicole, Le, Tuan, Biswas, Gautam

arXiv.org Artificial IntelligenceMar-21-2024

This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments.

assessment, reasoning, student, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v38i21.30364

2403.14565

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.86)
Education > Assessment & Standards > Assessment Methods (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana

Henkel, Owen, Horne-Robinson, Hannah, Hills, Libby, Roberts, Bill, McGrane, Joshua

arXiv.org Artificial IntelligenceOct-26-2023

This paper reports on a set of three recent experiments utilizing large-scale speech models to evaluate the oral reading fluency (ORF) of students in Ghana. While ORF is a well-established measure of foundational literacy, assessing it typically requires one-on-one sessions between a student and a trained evaluator, a process that is time-consuming and costly. Automating the evaluation of ORF could support better literacy instruction, particularly in education contexts where formative assessment is uncommon due to large class sizes and limited resources. To our knowledge, this research is among the first to examine the use of the most recent versions of large-scale speech models (Whisper V2 wav2vec2.0) for ORF assessment in the Global South. We find that Whisper V2 produces transcriptions of Ghanaian students reading aloud with a Word Error Rate of 13.5. This is close to the model's average WER on adult speech (12.8) and would have been considered state-of-the-art for children's speech transcription only a few years ago. We also find that when these transcriptions are used to produce fully automated ORF scores, they closely align with scores generated by expert human graders, with a correlation coefficient of 0.96. Importantly, these results were achieved on a representative dataset (i.e., students with regional accents, recordings taken in actual classrooms), using a free and publicly available speech model out of the box (i.e., no fine-tuning). This suggests that using large-scale speech models to assess ORF may be feasible to implement and scale in lower-resource, linguistically diverse educational contexts.

assessment, dataset, student, (11 more...)

arXiv.org Artificial Intelligence

2310.17606

Country:

Africa > Ghana (0.62)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Education > Educational Setting > K-12 Education (0.68)
Education > Assessment & Standards (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Telangana: AI to aid Govt schools in formative assessments

#artificialintelligenceMar-7-2022, 02:59:15 GMT

Hyderabad: Select Government school complexes (a cluster of high school, middle and primary schools) in the State may soon get to use some artificial intelligence based tools that will automate a few of the time and resources consuming processes like formative assessments, marking attendance, logging mid-day meals data among others. They will even be put to use to teach English and later other languages too. These artificial intelligence-based tools will be implemented in select school complexes in Moinabad as a pilot project through the Prof Raj Centre at IIIT-H, which is working to create artificial technologies and technology solutions for the grassroots. A team from the IIIT-H had initial meetings with select school complex head masters, resource persons and officials of the Education Department to understand problems at the grass-root level. "We plan to meet the concerned people once again and make specific plans for the technology interventions possible. We want to keep the technologies ready for the coming academic year. These will be short term projects of three to six months that aim to address the issues at the earliest," said Ramesh Loganathan, Co-Innvation Professor at IIIT-H.

aid govt school, formative assessment, student, (7 more...)

#artificialintelligence

Country: Asia > India > Telangana (0.44)

Industry:

Education > Assessment & Standards > Assessment Methods (0.64)
Education > Educational Setting > K-12 Education (0.59)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.55)

Add feedback

Future of Testing in Education: Artificial Intelligence - Center for American Progress

#artificialintelligenceSep-18-2021, 10:20:03 GMT

This series is about the future of testing in America's schools. Part one of the series presents a theory of action that assessments should play in schools. Part two--this issue brief--reviews advancements in technology, with a focus on artificial intelligence that can powerfully drive learning in real time. And the third part looks at assessment designs that can improve large-scale standardized tests. Despite the often-negative discussion about testing in schools, assessments are a necessary and useful tool in the teaching and learning process.1

assessment, natural language processing, student, (14 more...)

#artificialintelligence

Country:

Oceania > Australia (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Finland (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.69)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.96)
Education > Educational Setting > Online (0.95)
Education > Assessment & Standards (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.47)

Add feedback

AI and Formative Assessment

#artificialintelligenceSep-1-2020, 15:00:12 GMT

In my last post, I talked about effective formative assessments and their powerful impact on student learning. In this post, let's explore why AI is well-suited for formative assessment. I think individualized feedback is the most powerful advantage of AI for assessment. As a teacher, I can only be in one place at a time looking in one direction at a time. That means I have two choices for feedback: I can take some time to assess how each student is doing and then address general learning barriers as a class, or I can assess and give feedback to students one at a time.

artificial intelligence, machine learning, student, (8 more...)

#artificialintelligence

Industry: Education > Assessment & Standards > Assessment Methods (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback