Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring