From Priest to Doctor: Domain Adaptaion for Low-Resource Neural Machine Translation

Marashian, Ali, Rice, Enora, Gessler, Luke, Palmer, Alexis, von der Wense, Katharina

Dec-1-2024–arXiv.org Artificial Intelligence

Many of the world's languages have insufficient data to train high-performing general neural machine translation (NMT) models, let alone domain-specific models, and often the only available parallel data are small amounts of religious texts. Hence, domain adaptation (DA) is a crucial issue faced by contemporary NMT and has, so far, been underexplored for low-resource languages. In this paper, we evaluate a set of methods from both low-resource NMT and DA in a realistic setting, in which we aim to translate between a high-resource and a low-resource language with access to only: a) parallel Bible data, b) a bilingual dictionary, and c) a monolingual target-domain corpus in the high-resource language. Our results show that the effectiveness of the tested methods varies, with the simplest one, DALI, being most effective. We follow up with a small human evaluation of DALI, which shows that there is still a need for more careful investigation of how to accomplish DA for low-resource NMT.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

Dec-1-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania (0.04)
    - Indiana (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - Colorado > Boulder County
      - Boulder (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
- Europe
  - Iceland (0.04)
  - Spain (0.04)
  - Italy (0.04)
  - Bulgaria > Varna Province
    - Varna (0.04)
  - Germany
    - Rheinland-Pfalz > Mainz (0.04)
    - Berlin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Portugal > Lisbon
    - Lisbon (0.14)
  - Sweden > Vaestra Goetaland
    - Gothenburg (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)
  - Indonesia > Bali (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Government (0.68)
- Law (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)