CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring
Cohn, Clayton, S, Ashwin T, Mohammed, Naveeduddin, Biswas, Gautam
–arXiv.org Artificial Intelligence
Large language models (LLMs) have created new opportunities to assist teachers and support student learning. While researchers have explored various prompt engineering approaches in educational contexts, the degree to which these approaches generalize across domains--such as science, computing, and engineering--remains underexplored. In this paper, we introduce Chain-of-Thought Prompting + Active Learning (CoTAL), an LLM-based approach to formative assessment scoring that (1) leverages Evidence-Centered Design (ECD) to align assessments and rubrics with curriculum goals, (2) applies human-in-the-loop prompt engineering to automate response scoring, and (3) incorporates chain-of-thought (CoT) prompting and teacher and student feedback to iteratively refine questions, rubrics, and LLM prompts. Our findings demonstrate that CoTAL improves GPT-4's scoring performance across domains, achieving gains of up to 38.9% over a non-prompt-engineered baseline (i.e., without labeled examples, chain-of-thought prompting, or iterative refinement). Teachers and students judge CoTAL to be effective at scoring and explaining responses, and their feedback produces valuable insights that enhance grading accuracy and explanation quality.
arXiv.org Artificial Intelligence
Aug-15-2025
- Country:
- Africa > Middle East
- Morocco (0.04)
- Asia > Thailand
- Europe
- Czechia > Prague (0.04)
- Montenegro (0.04)
- North America > United States (0.28)
- Africa > Middle East
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: