Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages
Deoghare, Sourabh, Kanojia, Diptesh, Bhattacharyya, Pushpak
–arXiv.org Artificial Intelligence
This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by $2.5$ and $2.39$ TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning ($+1.29$ and $+1.44$ TER points), data augmentation ($+0.53$ and $+0.45$ TER points) and domain adaptation ($+0.35$ and $+0.45$ TER points). We release the synthetic data, code, and models accrued during this study publicly at https://github.com/cfiltnlp/Multilingual-APE.
arXiv.org Artificial Intelligence
Oct-23-2024
- Country:
- North America > United States
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Massachusetts
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- Pennsylvania > Philadelphia County
- Europe
- Germany > Berlin (0.04)
- Belgium (0.04)
- United Kingdom > England
- Surrey (0.04)
- Cambridgeshire > Cambridge (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- Singapore (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.05)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- India > Maharashtra
- Mumbai (0.04)
- China > Beijing
- Beijing (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Technology: