Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
Oni, Mangsura Kabir, Prama, Tabia Tanzin
–arXiv.org Artificial Intelligence
WORK Although the findings highlight the effectiveness of fine - tuned transformer models for Bengali - Sylheti translation, several limitations remain. The dataset size (5,002 parallel sentences) restricts the models' capacity to generalize across diverse syntactic structures, stylistic variations, and domain - specific expressions. In addition, orthographic inconsistencies in Sylheti introduce noise, leading to training instability, particularly in models like mBART - 50. Another limitation is the reliance on automatic evaluation metrics such as BLEU and chrF, which may not fully capture the linguistic richness or cultural nuance of Sylheti. Future research should therefore focus on expanding the datas et through community - driven contributions and data augmentation strategies. Incorporating orthographic normalization could improve consistency and reduce variability during training. Hybrid approaches that combine the strengths of pre - trained LLMs with fin e - tuned NMT models may also enhance translation robustness in low - resource settings. Finally, incorporating human evaluation will provide a more comprehensive assessment of translation adequacy, fluency, and cultural alignment.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia
- Bangladesh (0.05)
- India (0.04)
- Singapore (0.04)
- Europe > Switzerland (0.04)
- North America > United States
- Vermont > Chittenden County > Burlington (0.14)
- Asia
- Genre:
- Overview (0.68)
- Research Report > New Finding (0.46)
- Technology: