Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Wu, Zeqiu, Hu, Yushi, Shi, Weijia, Dziri, Nouha, Suhr, Alane, Ammanabrolu, Prithviraj, Smith, Noah A., Ostendorf, Mari, Hajishirzi, Hannaneh

Oct-30-2023–arXiv.org Artificial Intelligence

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.

arxiv preprint arxiv, human feedback, reward model, (12 more...)

arXiv.org Artificial Intelligence

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- Atlantic Ocean > Mediterranean Sea (0.04)
- South America > Suriname
  - Commewijne District > Nieuw Amsterdam (0.04)
- Oceania
  - Australia (0.04)
  - New Zealand (0.04)
- North America
  - Dominican Republic (0.04)
  - Belize (0.04)
  - United States
    - New York > Ontario County (0.04)
    - Massachusetts (0.04)
    - Missouri > St. Louis County
      - St. Louis (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Florida > Hillsborough County
      - Tampa (0.04)
    - California
      - Los Angeles County > Los Angeles (0.05)
      - San Francisco County > San Francisco (0.04)
      - San Diego County > San Diego (0.04)
      - Alameda County > Berkeley (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
  - Canada
    - Ontario > Toronto (0.14)
    - Nova Scotia (0.04)
    - Quebec > Montreal (0.04)
    - Manitoba
      - Winnipeg Metropolitan Region > Winnipeg (0.04)
      - Central Plains Region > Portage la Prairie (0.04)
- Europe
  - France (0.15)
  - United Kingdom (0.14)
  - Russia (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Russia (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report (1.00)

Industry:
- Consumer Products & Services > Travel (0.93)
- Education (0.93)
- Transportation
  - Passenger (1.00)
  - Marine (1.00)
- Leisure & Entertainment > Sports
  - Football (1.00)
- Government
  - Military (0.93)
  - Regional Government > North America Government
    - United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found