S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Open in new window