Preference Optimization for Reasoning with Pseudo Feedback

Open in new window