$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences

Open in new window