Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
Blaschke, Verena, Fedzechkina, Masha, ter Hoeve, Maartje
–arXiv.org Artificial Intelligence
Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families and/or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we analyze cross-lingual transfer for 266 languages from a wide variety of language families. Moreover, we include three popular NLP tasks: POS tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.
arXiv.org Artificial Intelligence
Jan-24-2025
- Country:
- Oceania > Australia
- North America
- United States
- New York (0.04)
- Washington > King County
- Seattle (0.04)
- Texas > Dallas County
- Dallas (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Mexico
- Puebla (0.04)
- Mexico City > Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Russia (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Saxony > Leipzig (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Hungary > Csongrád-Csanád County
- Szeged (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.14)
- Asia
- Singapore (0.04)
- Russia (0.04)
- Indonesia > Bali (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Philippines > Luzon
- Ilocos Region > Province of Pangasinan (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- China
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Technology: