REBEL: Reinforcement Learning via Regressing Relative Rewards
–Neural Information Processing Systems
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g.
Neural Information Processing Systems
Mar-21-2025, 08:49:15 GMT
- Country:
- Asia > Middle East (0.14)
- Europe (0.27)
- North America > United States (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Leisure & Entertainment > Sports > Hockey (1.00)
- Technology: