Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia
Susanto, Lucky, Diandaru, Ryandito, Krisnadhi, Adila, Purwarianti, Ayu, Wijaya, Derry
–arXiv.org Artificial Intelligence
Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using large language models for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.
arXiv.org Artificial Intelligence
Nov-2-2023
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Texas > Dallas County
- Dallas (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas > Dallas County
- Canada > British Columbia
- Europe
- Germany (0.04)
- Spain (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Albania > Tirana County
- Tirana (0.04)
- Asia
- Myanmar (0.04)
- China > Hong Kong (0.04)
- Vietnam > Hanoi
- Hanoi (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Indonesia > Borneo
- Kalimantan > East Kalimantan > Nusantara (0.04)
- North America
- Genre:
- Research Report > New Finding (0.93)
- Technology: