Comparison of Scoring Rationales Between Large Language Models and Human Raters