Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Blevins, Terra, Limisiewicz, Tomasz, Gururangan, Suchin, Li, Margaret, Gonen, Hila, Smith, Noah A., Zettlemoyer, Luke
–arXiv.org Artificial Intelligence
Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters. We propose Cross-lingual Expert Language Models (X-ELM), which mitigate this competition by independently training language models on subsets of the multilingual corpus. This process specializes X-ELMs to different languages while remaining effective as a multilingual ensemble. Our experiments show that when given the same compute budget, X-ELM outperforms jointly trained multilingual models across all considered languages and that these gains transfer to downstream tasks. X-ELM provides additional benefits over performance improvements: new experts can be iteratively added, adapting X-ELM to new languages without catastrophic forgetting. Furthermore, training is asynchronous, reducing the hardware requirements for multilingual training and democratizing multilingual modeling.
arXiv.org Artificial Intelligence
Jan-18-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- San Diego (0.04)
- Washington > King County
- Europe
- Asia > Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Technology: