NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task

Sahoo, Pramit, Brahma, Maharaj, Desarkar, Maunendra Sankar

arXiv.org Artificial Intelligence 

In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation \cite{lin-etal-2020-pre} for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng$\rightarrow$as, eng$\rightarrow$kha, eng$\rightarrow$lus, eng$\rightarrow$mni respectively. We also explore multilingual training with/without language grouping and layer-freezing. Our code, models, and generated translations are available here: https://github.com/pramitsahoo/WMT2024-LRILT.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found