Soft Language Clustering for Multilingual Model Pre-training

Zeng, Jiali, Jiang, Yufan, Yin, Yongjing, Jing, Yi, Meng, Fandong, Lin, Binghuai, Cao, Yunbo, Zhou, Jie

Jun-13-2023–arXiv.org Artificial Intelligence

Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-13-2023

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States
  - California (0.28)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Large Language Model (0.49)
    - Machine Translation (0.47)
    - Text Processing (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found