Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning

Open in new window