Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

Liu, Zhongtao, Riley, Parker, Deutsch, Daniel, Lui, Alison, Niu, Mengmeng, Shah, Apu, Freitag, Markus

Oct-14-2024–arXiv.org Artificial Intelligence

Collecting high-quality translations is crucial for the development and evaluation of machine translation systems. However, traditional human-only approaches are costly and slow. This study presents a comprehensive investigation of 11 approaches for acquiring translation data, including human-only, machineonly, and hybrid approaches. Our findings demonstrate that human-machine collaboration can match or even exceed the quality of human-only translations, while being more cost-efficient. Error analysis reveals the complementary strengths between human and machine contributions, highlighting the effectiveness of collaborative methods. Cost analysis further demonstrates the economic benefits of human-machine collaboration methods, with some approaches achieving top-tier quality at around 60% of the cost of traditional methods. We release a publicly available dataset containing nearly 18,000 segments of varying translation quality with corresponding human ratings to facilitate future research.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

Oct-14-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > New York
    - New York County > New York City (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Singapore (0.05)
  - Middle East > Qatar
    - Ad-Dawhah > Doha (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found