GRPO-$λ$: Credit Assignment improves LLM Reasoning

Open in new window