From Priest to Doctor: Domain Adaptaion for Low-Resource Neural Machine Translation
Marashian, Ali, Rice, Enora, Gessler, Luke, Palmer, Alexis, von der Wense, Katharina
–arXiv.org Artificial Intelligence
Many of the world's languages have insufficient data to train high-performing general neural machine translation (NMT) models, let alone domain-specific models, and often the only available parallel data are small amounts of religious texts. Hence, domain adaptation (DA) is a crucial issue faced by contemporary NMT and has, so far, been underexplored for low-resource languages. In this paper, we evaluate a set of methods from both low-resource NMT and DA in a realistic setting, in which we aim to translate between a high-resource and a low-resource language with access to only: a) parallel Bible data, b) a bilingual dictionary, and c) a monolingual target-domain corpus in the high-resource language. Our results show that the effectiveness of the tested methods varies, with the simplest one, DALI, being most effective. We follow up with a small human evaluation of DALI, which shows that there is still a need for more careful investigation of how to accomplish DA for low-resource NMT.
arXiv.org Artificial Intelligence
Dec-1-2024
- Country:
- North America
- United States
- Pennsylvania (0.04)
- Indiana (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Colorado > Boulder County
- Boulder (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Europe
- Iceland (0.04)
- Spain (0.04)
- Italy (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- Germany
- Rheinland-Pfalz > Mainz (0.04)
- Berlin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Portugal > Lisbon
- Lisbon (0.14)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.04)
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government (0.68)
- Law (0.46)
- Technology: