Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners

Open in new window