You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Limisiewicz, Tomasz, Malkin, Dan, Stanovsky, Gabriel

May-26-2023–arXiv.org Artificial Intelligence

Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacher-student knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers' knowledge into a single multilingual student. Our method outperforms standard training methods in low-resource languages and retrains performance on high-resource languages while using the same amount of data. If applied widely, our approach can increase the representation of low-resource languages in NLP systems.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-26-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland (0.04)
  - Washington > King County
    - Seattle (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe
  - Czechia > Prague (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel > Jerusalem District
      - Jerusalem (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found