It's the same but not the same: Do LLMs distinguish Spanish varieties?

Mayor-Rocher, Marina, Pozo, Cristina, Melero, Nina, Martínez, Gonzalo, Grandury, María, Reviriego, Pedro

arXiv.org Artificial Intelligence 

It's the same but not the same: Do LLMs distinguish Spanish varieties? Abstract: In recent years, large language models (LLMs) have demonstrated a high capacity for understanding and generating text in Spanish. However, with five hundred million native speakers, Spanish is not a homogeneous language but rather one rich in diatopic vari ations spanning both sides of the Atlantic. For this reason, in this study, we evaluate the ability of nine language models to identify and distinguish the morphosyntactic and lexical peculiarities of seven varieties of Spanish (Andean, Antillean, Continen tal Caribbean, Chilean, Peninsular, Mexican and Central American and Rioplatense) through a multiple - choice test. The results indicate that the Peninsular Spanish variety is the best identified by all models and that, among them, GPT - 4o is the only model c apable of recognizing the variability of the Spanish language.