Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair
Borisov, Maksim, Kozhirbayev, Zhanibek, Malykh, Valentin
–arXiv.org Artificial Intelligence
Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.
arXiv.org Artificial Intelligence
Mar-25-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Kazakhstan > Akmola Region
- Astana (0.04)
- Middle East
- Qatar > Ad-Dawhah
- Doha (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Russia (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Canada > Ontario
- Oceania > Australia
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology (0.93)
- Technology: