Probability-Consistent Preference Optimization for Enhanced LLM Reasoning