Iterative Reasoning Preference Optimization Richard Yuanzhe Pang 1,2 Weizhe Yuan 1,2 He He

Open in new window