Preference Optimization for Reasoning with Pseudo Feedback