Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam
Kortemeyer, Gerd, Caspar, Alexander, Horica, Daria
–arXiv.org Artificial Intelligence
We investigate whether contemporary multimodal LLMs can assist with grading open-ended calculus at scale without eroding validity. In a large first-year exam, students' handwritten work was graded by GPT-5 against the same rubric used by teaching assistants (TAs), with fractional credit permitted; TA rubric decisions served as ground truth. We calibrated a human-in-the-loop filter that combines a partial-credit threshold with an Item Response Theory (2PL) risk measure based on the deviation between the AI score and the model-expected score for each student-item. Unfiltered AI-TA agreement was moderate, adequate for low-stakes feedback but not for high-stakes use. Confidence filtering made the workload-quality trade-off explicit: under stricter settings, AI delivered human-level accuracy, but also left roughly 70% of the items to be graded by humans. Psychometric patterns were constrained by low stakes on the open-ended portion, a small set of rubric checkpoints, and occasional misalignment between designated answer regions and where work appeared. Practical adjustments such as slightly higher weight and protected time, a few rubric-visible substeps, stronger spatial anchoring should raise ceiling performance. Overall, calibrated confidence and conservative routing enable AI to reliably handle a sizable subset of routine cases while reserving expert judgment for ambiguous or pedagogically rich responses.
arXiv.org Artificial Intelligence
Nov-14-2025
- Country:
- Europe
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- West Yorkshire > Leeds (0.04)
- Switzerland > Zürich
- North America > United States
- District of Columbia > Washington (0.04)
- Michigan (0.04)
- New York (0.04)
- Europe
- Genre:
- Instructional Material (0.68)
- Research Report (0.50)
- Industry:
- Education
- Assessment & Standards (0.93)
- Curriculum > Subject-Specific Education (0.70)
- Educational Setting (0.66)
- Education
- Technology: