The VolcTrans System for WMT22 Multilingual Machine Translation Task
Qian, Xian, Hu, Kai, Wang, Jiaqiang, Liu, Yifeng, Pan, Xingyuan, Cao, Jun, Wang, Mingxuan
–arXiv.org Artificial Intelligence
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformerbased multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. A series of heuristic rules clean both bilingual and monolingual texts. On the official test set, our system achieves 17.3 BLEU, 21.9 spBLEU, and 41.9 chrF2++ on average over all language pairs. The average inference speed is 11.5 sentences per second using a single Nvidia Tesla V100 GPU. Our code and trained models are available at https://github.com/xian8/wmt22
arXiv.org Artificial Intelligence
Oct-20-2022
- Country:
- North America
- United States > Texas
- Dallas County > Dallas (0.04)
- Canada > British Columbia
- United States > Texas
- Europe
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Valencian Community
- Asia > China
- Hubei Province > Wuhan (0.04)
- Hong Kong (0.04)
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology (0.35)
- Technology: