Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
Ng, York Hay, Khan, Aditya, Lu, Xiang, Salloum, Matteo, Zhou, Michael, Hoang, Phuong H., Doğruöz, A. Seza, Lee, En-Shiun Annie
–arXiv.org Artificial Intelligence
Existing linguistic knowledge bases such as URIEL+ provide valuable geographic, genetic and typological distances for cross-lingual transfer but suffer from two key limitations. One, their one-size-fits-all vector representations are ill-suited to the diverse structures of linguistic data, and two, they lack a principled method for aggregating these signals into a single, comprehensive score. In this paper, we address these gaps by introducing a framework for type-matched language distances. We propose novel, structure-aware representations for each distance type: speaker-weighted distributions for geography, hyperbolic embeddings for genealogy, and a latent variables model for typology. We unify these signals into a robust, task-agnostic composite distance. In selecting transfer languages, our representations and composite distances consistently improve performance across a wide range of NLP tasks, providing a more principled and effective toolkit for multilingual research.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ukraine (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.04)
- Flanders > East Flanders
- Ghent (0.04)
- France
- Hauts-de-France > Nord
- Lille (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Hauts-de-France > Nord
- Denmark > Capital Region
- Copenhagen (0.04)
- Italy > Tuscany
- Florence (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Austria > Vienna (0.14)
- Spain > Galicia
- Madrid (0.04)
- Ireland > Leinster
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Michigan (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (0.93)
- Technology: