Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

Uhlig, Kaden, Wuebker, Joern, Reinauer, Raphael, DeNero, John

Sep-26-2024–arXiv.org Artificial Intelligence

Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.

computational linguistic, proceedings, translation, (11 more...)

arXiv.org Artificial Intelligence

Sep-26-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Oregon > Multnomah County
      - Portland (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
- Europe
  - Switzerland (0.04)
  - Slovenia (0.04)
  - Czechia > Prague (0.04)
  - Bulgaria
    - Varna Province > Varna (0.04)
    - Sofia City Province > Sofia (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Middle East
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)
    - Malta > Port Region
      - Southern Harbour District > Valletta (0.04)
  - Sweden > Vaestra Goetaland
    - Gothenburg (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Poland > Masovia Province
    - Warsaw (0.04)
- Asia
  - Singapore (0.04)
  - Thailand
    - Bangkok > Bangkok (0.04)
    - Phuket > Phuket (0.04)
  - Middle East
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Republic of Türkiye > Istanbul Province
      - Istanbul (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Government (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.54)