Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

Oni, Mangsura Kabir, Prama, Tabia Tanzin

Oct-23-2025–arXiv.org Artificial Intelligence

WORK Although the findings highlight the effectiveness of fine - tuned transformer models for Bengali - Sylheti translation, several limitations remain. The dataset size (5,002 parallel sentences) restricts the models' capacity to generalize across diverse syntactic structures, stylistic variations, and domain - specific expressions. In addition, orthographic inconsistencies in Sylheti introduce noise, leading to training instability, particularly in models like mBART - 50. Another limitation is the reliance on automatic evaluation metrics such as BLEU and chrF, which may not fully capture the linguistic richness or cultural nuance of Sylheti. Future research should therefore focus on expanding the datas et through community - driven contributions and data augmentation strategies. Incorporating orthographic normalization could improve consistency and reduce variability during training. Hybrid approaches that combine the strengths of pre - trained LLMs with fin e - tuned NMT models may also enhance translation robustness in low - resource settings. Finally, incorporating human evaluation will provide a more comprehensive assessment of translation adequacy, fluency, and cultural alignment.

large language model, machine learning, translation, (20 more...)

arXiv.org Artificial Intelligence

Oct-23-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Bangladesh (0.05)
  - India (0.04)
  - Singapore (0.04)
- Europe > Switzerland (0.04)
- North America > United States
  - Vermont > Chittenden County > Burlington (0.14)

Genre:
- Overview (0.68)
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Machine Translation (1.00)