Lost in Variation? Evaluating NLI Performance in Basque and Spanish Geographical Variants
Bengoetxea, Jaione, Gonzalez-Dios, Itziar, Agerri, Rodrigo
–arXiv.org Artificial Intelligence
In this paper, we evaluate the capacity of current language technologies to understand Basque and Spanish language varieties. We use Natural Language Inference (NLI) as a pivot task and introduce a novel, manually-curated parallel dataset in Basque and Spanish, along with their respective variants. Our empirical analysis of crosslingual and in-context learning experiments using encoder-only and decoder-based Large Language Models (LLMs) shows a performance drop when handling linguistic variation, especially in Basque. Error analysis suggests that this decline is not due to lexical overlap, but rather to the linguistic variation itself. Further ablation experiments indicate that encoder-only models particularly struggle with Western Basque, which aligns with linguistic theory that identifies peripheral dialects (e.g., Western) as more distant from the standard. All data and code are publicly available.
arXiv.org Artificial Intelligence
Jul-24-2025
- Country:
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Singapore (0.04)
- Japan > Honshū
- Europe
- Estonia > Tartu County
- Tartu (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Faroe Islands > Streymoy
- Tórshavn (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Italy (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Spain
- Basque Country > Gipuzkoa Province (0.04)
- Canary Islands > Tenerife (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Estonia > Tartu County
- North America
- Costa Rica > Heredia Province
- Heredia (0.04)
- Cuba (0.05)
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Louisiana
- Orleans Parish > New Orleans (0.04)
- Costa Rica > Heredia Province
- South America
- Asia
- Genre:
- Research Report > New Finding (0.68)
- Technology: