URIEL+: Enhancing Linguistic Inclusion and Usability in a Typological and Multilingual Knowledge Base
Khan, Aditya, Shipton, Mason, Anugraha, David, Duan, Kaiyao, Hoang, Phuong H., Khiu, Eric, Doğruöz, A. Seza, Lee, En-Shiun Annie
–arXiv.org Artificial Intelligence
URIEL is a knowledge base offering geographical, phylogenetic, and typological vector representations for 7970 languages. It includes distance measures between these vectors for 4005 languages, which are accessible via the lang2vec tool. Despite being frequently cited, URIEL is limited in terms of linguistic inclusion and overall usability. To tackle these challenges, we introduce URIEL+, an enhanced version of URIEL and lang2vec that addresses these limitations. In addition to expanding typological feature coverage for 2898 languages, URIEL+ improves the user experience with robust, customizable distance calculations to better suit the needs of users. These upgrades also offer competitive performance on downstream tasks and provide distances that better align with linguistic distance studies.
arXiv.org Artificial Intelligence
Dec-19-2024
- Country:
- Asia
- Indonesia > Bali (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East > Israel (0.04)
- Europe
- Belgium > Flanders
- East Flanders > Ghent (0.04)
- Finland > Northern Ostrobothnia
- Oulu (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- Belgium > Flanders
- North America
- Canada > Ontario
- Toronto (0.14)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Colorado (0.04)
- Michigan (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: