Aligning language models with human preferences

Apr-18-2024–arXiv.org Artificial Intelligence

Language models (LMs) trained on vast quantities of text data can acquire sophisticated skills such as generating summaries, answering questions or generating code. However, they also manifest behaviors that violate human preferences, e.g., they can generate offensive content, falsehoods or perpetuate social biases. In this thesis, I explore several approaches to aligning LMs with human preferences. First, I argue that aligning LMs can be seen as Bayesian inference: conditioning a prior (base, pretrained LM) on evidence about human preferences (Chapter 2). Conditioning on human preferences can be implemented in numerous ways. In Chapter 3, I investigate the relation between two approaches to finetuning pretrained LMs using feedback given by a scoring function: reinforcement learning from human feedback (RLHF) and distribution matching. I show that RLHF can be seen as a special case of distribution matching but distributional matching is strictly more general. In chapter 4, I show how to extend the distribution matching to conditional language models. Finally, in chapter 5 I explore a different root: conditioning an LM on human preferences already during pretraining. I show that involving human feedback from the very start tends to be more effective than using it only during supervised finetuning. Overall, these results highlight the room for alignment techniques different from and complementary to RLHF.

hyperparameter and implementation detail, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-18-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Maryland > Montgomery County
      - Gaithersburg (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Washington > King County
      - Seattle (0.14)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Middlesex County > Cambridge (0.04)
    - California
      - San Francisco County > San Francisco (0.13)
      - San Diego County > San Diego (0.04)
      - Santa Clara County > Palo Alto (0.04)
      - San Mateo County > San Mateo (0.04)
    - New York > New York County
      - New York City (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.13)
- Europe
  - Germany > Berlin (0.04)
  - France (0.04)
  - Czechia > Prague (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)
  - Latvia > Lubāna Municipality
    - Lubāna (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Sardinia (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Poland > Masovia Province
    - Warsaw (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Macao (0.04)
  - Thailand > Phuket
    - Phuket (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (1.00)
- Instructional Material (1.00)

Industry:
- Government (0.67)
- Education (0.67)
- Information Technology (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models
      - Directed Networks > Bayesian Learning (0.67)
      - Undirected Networks > Markov Models (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found