Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

Blevins, Terra, Limisiewicz, Tomasz, Gururangan, Suchin, Li, Margaret, Gonen, Hila, Smith, Noah A., Zettlemoyer, Luke

Jan-18-2024–arXiv.org Artificial Intelligence

Despite their popularity in non-English NLP, multilingual language models often underperform monolingual ones due to inter-language competition for model parameters. We propose Cross-lingual Expert Language Models (X-ELM), which mitigate this competition by independently training language models on subsets of the multilingual corpus. This process specializes X-ELMs to different languages while remaining effective as a multilingual ensemble. Our experiments show that when given the same compute budget, X-ELM outperforms jointly trained multilingual models across all considered languages and that these gains transfer to downstream tasks. X-ELM provides additional benefits over performance improvements: new experts can be iteratively added, adapting X-ELM to new languages without catastrophic forgetting. Furthermore, training is asynchronous, reducing the hardware requirements for multilingual training and democratizing multilingual modeling.

computational linguistic, experiment, tf-idf, (14 more...)

arXiv.org Artificial Intelligence

Jan-18-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - California > San Diego County
      - San Diego (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Belgium (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - UAE > Abu Dhabi Emirate
    - Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (1.00)