Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Artetxe, Mikel, Schwenk, Holger
–arXiv.org Artificial Intelligence
An increasingly popular approach to alleviate this issue is to first learn general language representations on unlabeled data, which are then integrated in task-specific downstream systems. This approach was first popularized by word embeddings (Mikolov et al., 2013b; This work was performed during an internship at Facebook AI Research. Pennington et al., 2014), but has recently been superseded by sentence-level representations (Peters et al., 2018; Devlin et al., 2019). Nevertheless, all these works learn a separate model for each language and are thus unable to leverage information across different languages, greatly limiting their potential performance for low-resource languages. In this work, we are interested in universal language agnostic sentence embeddings, that is, vector representations of sentences that are general with respect to two dimensions: the input language and the NLP task.
arXiv.org Artificial Intelligence
Sep-25-2019
- Country:
- Oceania > Australia
- North America
- Canada (0.05)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Colorado > Denver County
- Denver (0.04)
- Minnesota > Hennepin County
- Europe
- Spain > Basque Country (0.04)
- Germany > Berlin (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- France > Hauts-de-France
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.05)
- Asia
- China (0.04)
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Republic of Türkiye > Istanbul Province
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- India > Maharashtra
- Mumbai (0.04)
- Genre:
- Research Report (0.50)
- Technology: