AITopics | spanish language

Collaborating Authors

spanish language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RigoChat 2: an adapted language model to Spanish using a bounded dataset and reduced hardware

Gómez, Gonzalo Santamaría, Subies, Guillem García, Ruiz, Pablo Gutiérrez, Valero, Mario González, Fuertes, Natàlia, Zamorano, Helena Montoro, Sanz, Carmen Muñoz, Plaza, Leire Rosado, García, Nuria Aldama, Sánchez, David Betancur, Sushkova, Kateryna, Nieto, Marta Guerrero, Jiménez, Álvaro Barbero

arXiv.org Artificial IntelligenceMar-11-2025

Large Language Models (LLMs) have become a key element of modern artificial intelligence, demonstrating the ability to address a wide range of language processing tasks at unprecedented levels of accuracy without the need of collecting problem-specific data. However, these versatile models face a significant challenge: both their training and inference processes require substantial computational resources, time, and memory. Consequently, optimizing this kind of models to minimize these requirements is crucial. In this article, we demonstrate that, with minimal resources and in a remarkably short time, it is possible to enhance a state-of-the-art model, specifically for a given language task, without compromising its overall capabilities using a relatively small pretrained LLM as a basis. Specifically, we present our use case, RigoChat 2, illustrating how LLMs can be adapted to achieve superior results in Spanish-language tasks.

dataset, evaluation, language model, (17 more...)

arXiv.org Artificial Intelligence

2503.08188

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States (0.04)
Europe > Portugal > Guarda > Guarda (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?

Mayor-Rocher, Marina, Melero, Nina, Merino-Gómez, Elena, Grandury, María, Conde, Javier, Reviriego, Pedro

arXiv.org Artificial IntelligenceSep-8-2024

Large Language Models (LLMs) have been profusely evaluated on their ability to answer questions on many topics and their performance on different natural language understanding tasks. Those tests are usually conducted in English, but most LLM users are not native English speakers. Therefore, it is of interest to analyze how LLMs understand other languages at different levels: from paragraphs to morphems. In this paper, we evaluate the performance of state-of-the-art LLMs in TELEIA, a recently released benchmark with similar questions to those of Spanish exams for foreign students, covering topics such as reading comprehension, word formation, meaning and compositional semantics, and grammar. The results show that LLMs perform well at understanding Spanish but are still far from achieving the level of a native speaker in terms of grammatical competence.

correct answer, llm, teleia, (14 more...)

arXiv.org Artificial Intelligence

2409.15334

Country:

Europe > Spain > Galicia > Madrid (0.04)
South America (0.04)
North America > Central America (0.04)

Genre: Research Report (0.70)

Industry:

Government (0.46)
Education > Assessment & Standards > Student Performance (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CNER: A tool Classifier of Named-Entity Relationships

Torres, Jefferson A. Peña, De Piñerez, Raúl E. Gutiérrez

arXiv.org Artificial IntelligenceMay-16-2024

However, Spanish is occasionally adopted as the focus language for research endeavors and as result multiple projects are conducted in Spanish to explore language-specific nuances and challenges in NLP applications. Named-Entity recognition [1], Machine Translation [2], Semantic Relation Extraction [3] among others tasks have been conducted with a focus on Spanish language data, allowing for a more nuanced understanding of the intricacies involved. In this paper we present Classifier for Named Entities Recognized (CNER) a linguistically-aware online service that offers the possibility to test two main tasks of NLP, Named Entity Recognition (NER) and Relation Extraction (RE) for Spanish language. This together with other projects on Spanish language have been evaluated and adapted as a web service. In this context, language technologies and natural language processing (NLP) tools can support the identification of useful information in text and to promote its understanding. Specifically, CNER i) identifies the mentions follow the ACE standard with entity types include Person (PER), Organisation (ORG), Facility (FAC), Location (LOC), Geographical/Political (GPE), Vehicle (VEH), Vehicle (VEH) and Weapon (WEA) [4], [5]; ii) displays three different NER tools as previous step to RE task and iii) offers entity relationship information through tags GPE-AFF, PHYS, DISC, EMP-ORG, ART, NON-REL representing the relations between two entities [6] .

cner, extraction, spanish language, (15 more...)

arXiv.org Artificial Intelligence

2405.10485

Country:

South America > Colombia > Valle del Cauca Department > Cali (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Europe > Portugal > Lisbon > Lisbon (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Spanish Pre-trained BERT Model and Evaluation Data

Cañete, José, Chaperon, Gabriel, Fuentes, Rodrigo, Ho, Jou-Hui, Kang, Hojin, Pérez, Jorge

arXiv.org Artificial IntelligenceAug-5-2023

The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pretrained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks. The field of natural language processing (NLP) has made incredible progress in the last two years.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.02976

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural Language Processing

de la Iglesia, Iker, Atutxa, Aitziber, Gojenola, Koldo, Barrena, Ander

arXiv.org Artificial IntelligenceJun-12-2023

The utilization of clinical reports for various secondary purposes, including health research and treatment monitoring, is crucial for enhancing patient care. Natural Language Processing (NLP) tools have emerged as valuable assets for extracting and processing relevant information from these reports. However, the availability of specialized language models for the clinical domain in Spanish has been limited. In this paper, we introduce EriBERTa, a bilingual domain-specific language model pre-trained on extensive medical and clinical corpora. We demonstrate that EriBERTa outperforms previous Spanish language models in the clinical domain, showcasing its superior capabilities in understanding medical texts and extracting meaningful information. Moreover, EriBERTa exhibits promising transfer learning abilities, allowing for knowledge transfer from one language to another. This aspect is particularly beneficial given the scarcity of Spanish clinical data.

eriberta, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.07373

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
North America > Montserrat (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.49)
Health & Medicine > Diagnostic Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Regionalized models for Spanish language variations based on Twitter

Tellez, Eric S., Moctezuma, Daniela, Miranda, Sabino, Graff, Mario, Ruiz, Guillermo

arXiv.org Artificial IntelligenceDec-9-2022

Spanish is one of the most spoken languages in the globe, but not necessarily Spanish is written and spoken in the same way in different countries. Understanding local language variations can help to improve model performances on regional tasks, both understanding local structures and also improving the message's content. For instance, think about a machine learning engineer who automatizes some language classification task on a particular region or a social scientist trying to understand a regional event with echoes on social media; both can take advantage of dialect-based language models to understand what is happening with more contextual information hence more precision. This manuscript presents and describes a set of regionalized resources for the Spanish language built on four-year Twitter public messages geotagged in 26 Spanish-speaking countries. We introduce word embeddings based on FastText, language models based on BERT, and per-region sample corpora. We also provide a broad comparison among regions covering lexical and semantical similarities; as well as examples of using regional resources on message classification tasks.

machine learning, natural language, springer nature 2021, (20 more...)

arXiv.org Artificial Intelligence

2110.06128

Country:

North America > United States (0.14)
South America > Argentina (0.05)
North America > Cuba (0.04)
(35 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Learning about Spanish dialects through Twitter

Gonçalves, Bruno, Sánchez, David

arXiv.org Machine LearningFeb-5-2017

This paper maps the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variation on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find that varieties spoken in urban areas have an international character in contrast to country areas where dialects show a more regional uniformity.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1511.0497

Country:

North America > Mexico (0.14)
Europe > Spain (0.05)
South America > Colombia (0.05)
(13 more...)

Genre: Research Report > New Finding (0.47)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback