REBEL: Reinforcement Learning via Regressing Relative Rewards

Mar-20-2026, 20:18:58 GMT–Neural Information Processing Systems

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g.

machine learning, natural language, reinforcement learning, (9 more...)

Neural Information Processing Systems

Mar-20-2026, 20:18:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.81)
  - Machine Learning > Reinforcement Learning (0.44)