Comparison of Scoring Rationales Between Large Language Models and Human Raters

Open in new window