CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring

Cohn, Clayton, S, Ashwin T, Mohammed, Naveeduddin, Biswas, Gautam

Aug-15-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have created new opportunities to assist teachers and support student learning. While researchers have explored various prompt engineering approaches in educational contexts, the degree to which these approaches generalize across domains--such as science, computing, and engineering--remains underexplored. In this paper, we introduce Chain-of-Thought Prompting + Active Learning (CoTAL), an LLM-based approach to formative assessment scoring that (1) leverages Evidence-Centered Design (ECD) to align assessments and rubrics with curriculum goals, (2) applies human-in-the-loop prompt engineering to automate response scoring, and (3) incorporates chain-of-thought (CoT) prompting and teacher and student feedback to iteratively refine questions, rubrics, and LLM prompts. Our findings demonstrate that CoTAL improves GPT-4's scoring performance across domains, achieving gains of up to 38.9% over a non-prompt-engineered baseline (i.e., without labeled examples, chain-of-thought prompting, or iterative refinement). Teachers and students judge CoTAL to be effective at scoring and explaining responses, and their feedback produces valuable insights that enhance grading accuracy and explanation quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Aug-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Europe (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education
  - Curriculum > Subject-Specific Education (1.00)
  - Educational Technology > Educational Software
    - Computer Based Training (1.00)
    - Computer-Aided Assessment (0.68)
  - Educational Setting
    - Online (1.00)
    - K-12 Education (0.93)
  - Assessment & Standards
    - Student Performance (1.00)
    - Assessment Methods (0.73)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found