Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization
Vieira, Inacio, Castaldo, Antonio, O'Doherty, James, Castilho, Sheila
–arXiv.org Artificial Intelligence
LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.
arXiv.org Artificial Intelligence
Nov-3-2025
- Country:
- Asia
- Middle East > Saudi Arabia
- Asir Province > Abha (0.04)
- Singapore (0.04)
- Middle East > Saudi Arabia
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- Finland > Pirkanmaa
- Tampere (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Denmark > Capital Region
- North America
- Canada > Ontario
- National Capital Region > Ottawa (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.50)
- Technology: