REBEL: Reinforcement Learning via Regressing Relative Rewards Zhaolin Gao 1, Jonathan D. Chang
–Neural Information Processing Systems
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g.
Neural Information Processing Systems
Feb-14-2026, 19:00:19 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Republic of Türkiye (0.04)
- Europe
- North America > United States
- Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Leisure & Entertainment > Sports > Hockey (1.00)
- Technology: