f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment

Open in new window