Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier

Apr-30-2024–arXiv.org Artificial Intelligence

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization: the reward function itself. Inspired by demonstration-guided RL, we here introduce the Reward Calibration from Demonstration (RCfD), which leverages human demonstrations and a reward model to recalibrate the reward objective. Formally, given a prompt, the RCfD objective minimizes the distance between the demonstrations' and LLM's rewards rather than directly maximizing the reward function. This objective shift avoids incentivizing the LLM to exploit the reward model and promotes more natural and diverse language generation. We show the effectiveness of RCfD on three language tasks, which achieves comparable performance to carefully tuned baselines while mitigating ROO.

demonstration, proc, rcfd, (15 more...)

arXiv.org Artificial Intelligence

Apr-30-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Missouri > Cole County (0.05)
  - Illinois > Coles County
    - Charleston (0.05)
- Europe
  - United Kingdom > Wales (0.04)
  - Spain > Galicia
    - A Coruña Province > Santiago de Compostela (0.04)
- Asia > India
  - Maharashtra (0.05)
  - Uttar Pradesh > Lucknow (0.04)

Genre:
- Research Report (1.00)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Government (1.00)
- Health & Medicine > Consumer Health (0.93)
- Law (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found