mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations
Pfeiffer, Jonas, Piccinno, Francesco, Nicosia, Massimo, Wang, Xinyi, Reid, Machel, Ruder, Sebastian
–arXiv.org Artificial Intelligence
Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language-agnostic information. We identify representation drift during fine-tuning as a key limitation of modular generative models and develop strategies that enable effective zero-shot transfer. Our model outperforms mT5 at the same parameter sizes by a large margin on representative natural language understanding and generation tasks in 40+ languages. Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%, thereby greatly alleviating the source language hallucination problem.
arXiv.org Artificial Intelligence
May-23-2023
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Washington > King County
- Canada > British Columbia
- Europe
- Spain (0.04)
- France (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Middle East > Cyprus
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America
- Genre:
- Research Report (0.49)
- Technology: