Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration

Mao, Xin, Li, Feng-Lin, Xu, Huimin, Zhang, Wei, Luu, Anh Tuan

Feb-25-2024–arXiv.org Artificial Intelligence

While Reinforcement Learning from Human Feedback (RLHF) significantly enhances the generation quality of Large Language Models (LLMs), recent studies have raised concerns regarding the complexity and instability associated with the Proximal Policy Optimization (PPO) algorithm, proposing a series of order-based calibration methods as viable alternatives. This paper delves further into current order-based methods, examining their inefficiencies in utilizing reward values and addressing misalignment issues. Building upon these findings, we propose a novel \textbf{V}alue-based \textbf{C}ali\textbf{B}ration (VCB) method to better align LLMs with human preferences. Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets, providing impressive generalizability, robustness, and stability in diverse settings.

calibration method, dataset, sft, (15 more...)

arXiv.org Artificial Intelligence

Feb-25-2024

arXiv.org PDF

Add feedback

Country:
- Pacific Ocean > North Pacific Ocean
  - Sea of Okhotsk (0.04)
- Europe
  - United Kingdom (0.04)
  - Poland (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
- Atlantic Ocean > North Atlantic Ocean
  - Baltic Sea (0.04)
- Asia
  - Singapore (0.14)
  - Russia > Far Eastern Federal District
    - Sea of Okhotsk (0.04)
  - Middle East
    - Israel (0.04)
    - Iran > Tehran Province
      - Tehran (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Government > Military (1.00)
- Law (0.93)
- Leisure & Entertainment > Sports
  - Soccer (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found