ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning

Open in new window