On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

Korbak, Tomasz, Elsahar, Hady, Kruszewski, Germán, Dymetman, Marc

Nov-14-2022–arXiv.org Artificial Intelligence

The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training-from-scratch to a fine-tuning paradigm. While in some applications the goal is to "nudge" the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged to tackle this challenge: Reward Maximization (RM) and, more recently, Distribution Matching (DM). RM applies standard Reinforcement Learning (RL) techniques, such as Policy Gradients, to gradually increase the reward signal. DM prescribes to first make explicit the target distribution that the model is fine-tuned to approximate. Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM. We further observe that while DM differs from RM, it can suffer from similar training difficulties, such as high gradient variance. We leverage connections between the two paradigms to import the concept of baseline into DM methods. We empirically validate the benefits of adding a baseline on an array of controllable language generation tasks such as constraining topic, sentiment, and gender distributions in texts sampled from a language model. We observe superior performance in terms of constraint satisfaction, stability and sample efficiency.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Nov-14-2022

arXiv.org PDF

Add feedback

Country:
- Oceania
  - Fiji > Western Division
    - Lautoka (0.04)
  - Australia > New South Wales
    - Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Florida (0.04)
    - North Dakota (0.04)
    - New Jersey (0.04)
    - Virginia (0.04)
    - Pennsylvania (0.04)
    - Mississippi (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Ohio > Franklin County
      - Columbus (0.04)
    - Texas
      - Travis County > Austin (0.14)
      - Bexar County > San Antonio (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Middlesex County > Cambridge (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - Santa Clara County > Palo Alto (0.04)
      - San Mateo County > San Mateo (0.04)
      - San Diego County > San Diego (0.04)
      - Alameda County > Berkeley (0.04)
    - New York > New York County
      - New York City (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Russia (0.04)
  - Poland (0.04)
  - Switzerland (0.04)
  - France (0.04)
  - Slovakia > Bratislava
    - Bratislava (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - United Kingdom > England
    - Hampshire > Portsmouth (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy
    - Sardinia (0.04)
    - Lazio > Rome (0.04)
  - Germany > Baden-Württemberg
    - Stuttgart Region > Stuttgart (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- Asia
  - Myanmar (0.14)
  - Russia (0.04)
  - Sri Lanka (0.04)
  - India > Karnataka (0.04)
  - Macao (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - China
    - Hong Kong (0.04)
    - Beijing > Beijing (0.04)
  - Middle East
    - Israel (0.04)
    - Iran (0.04)
    - Iraq > Al Qadisiyah Governorate (0.04)
    - Bahrain (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.14)
    - Kansai > Osaka Prefecture
      - Osaka (0.04)
    - Chūbu > Shizuoka Prefecture
      - Shizuoka (0.04)
- Africa
  - Kenya (0.04)
  - Ethiopia > Addis Ababa
    - Addis Ababa (0.04)

Genre:
- Personal (1.00)
- Research Report > New Finding (0.45)
- Instructional Material > Course Syllabus & Notes (0.45)

Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law > Civil Rights & Constitutional Law (0.68)
- Information Technology > Security & Privacy (0.68)
- Media
  - News (0.92)
  - Music (0.67)
- Leisure & Entertainment > Sports
  - Soccer (1.00)
  - Football (1.00)
- Government
  - Military (0.92)
  - Regional Government > North America Government
    - United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.93)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found