Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

Open in new window