PORT: Preference Optimization on Reasoning Traces