MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning

Open in new window