NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task
Sahoo, Pramit, Brahma, Maharaj, Desarkar, Maunendra Sankar
–arXiv.org Artificial Intelligence
In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.
arXiv.org Artificial Intelligence
Oct-4-2024
- Country:
- North America > United States
- Pennsylvania (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.14)
- Europe
- Belgium (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- Singapore (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- India > Telangana
- Hyderabad (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: