Reconsidering SMT Over NMT for Closely Related Languages: A Case Study of Persian-Hindi Pair
Yousofi, Waisullah, Bhattacharyya, Pushpak
–arXiv.org Artificial Intelligence
This paper demonstrates that Phrase-Based Statistical Machine Translation (PBSMT) can outperform Transformer-based Neural Machine Translation (NMT) in moderate-resource scenarios, specifically for structurally similar languages, like the Persian-Hindi pair. Despite the Transformer architecture's typical preference for large parallel corpora, our results show that PBSMT achieves a BLEU score of 66.32, significantly exceeding the Transformer-NMT score of 53.7 on the same dataset. Additionally, we explore variations of the SMT architecture, including training on Romanized text and modifying the word order of Persian sentences to match the left-to-right (LTR) structure of Hindi. Our findings highlight the importance of choosing the right architecture based on language pair characteristics and advocate for SMT as a high-performing alternative, even in contexts commonly dominated by NMT.
arXiv.org Artificial Intelligence
Dec-22-2024
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- India > Maharashtra
- Mumbai (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- China > Beijing
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany > Berlin (0.04)
- Lithuania (0.04)
- Belgium > Brussels-Capital Region
- North America > United States
- Colorado > Denver County > Denver (0.04)
- Oceania > Australia
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology: