Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Open in new window