A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data
Watson, Joe, O'Connor, Ivan, Chen, Chia-Wen, Sun, Luning, Luo, Fang, Stillwell, David
–arXiv.org Artificial Intelligence
Psychological assessments are dominated by rating scales, which cannot capture the nuance in natural language. Efforts to supplement them with qualitative text have relied on labelled datasets or expert rubrics, limiting scalability. We introduce a framewo rk that avoids this reliance: large language models (LLMs) score free - text responses with simple prompts to produce candidate LLM items, from which we retain those that yield the most test information when co - calibrated with a baseline scale. Using depress ion as a case study, we developed and tested the method in upper - secondary students (n=693) and a matched synthetic dataset (n=3,000). Results on held - out test sets show ed that augmenting a 19 - item scale with LLM items improved its precision, accuracy, and convergent validity. Further, the test information gain matched that of adding as many as 16 rating - scale items. This framework leverage s the increas ing availability of transcribed language to enhance psychometric measures, with applications in clinical h ealth and beyond.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia > China (0.28)
- Europe > United Kingdom
- England (0.28)
- Genre:
- Questionnaire & Opinion Survey (0.93)
- Research Report
- Experimental Study (0.93)
- New Finding (0.68)
- Industry:
- Health & Medicine (1.00)
- Government (1.00)
- Education
- Technology: