Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair
Borisov, Maksim, Kozhirbayev, Zhanibek, Malykh, Valentin
–arXiv.org Artificial Intelligence
Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.
arXiv.org Artificial Intelligence
Mar-25-2025
- Country:
- Asia (1.00)
- Europe > Russia
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology (0.46)
- Technology: