Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning

Open in new window