Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models

Open in new window