AITopics | Ceuta

Collaborating Authors

Ceuta

MEL: Legal Spanish Language Model

Sánchez, David Betancur, García, Nuria Aldama, Jiménez, Álvaro Barbero, Nieto, Marta Guerrero, Morales, Patricia Marsà, Salas, Nicolás Serrano, Hernán, Carlos García, Coll, Pablo Haya, Ponsoda, Elena Montiel, Ibáñez, Pablo Calleja

arXiv.org Artificial IntelligenceJan-27-2025

Legal texts, characterized by complex and specialized terminology, present a significant challenge for Language Models. Adding an underrepresented language, such as Spanish, to the mix makes it even more challenging. While pre-trained models like XLM-RoBERTa have shown capabilities in handling multilingual corpora, their performance on domain specific documents remains underexplored. This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large, fine-tuned on legal documents such as BOE (Bolet\'in Oficial del Estado, the Spanish oficial report of laws) and congress texts. We detail the data collection, processing, training, and evaluation processes. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language. We also present case studies demonstrating the model's application to new legal texts, highlighting its potential to perform top results over different NLP tasks.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.16011

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Spain > Ceuta (0.04)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation

Li, Bryan, Haider, Samar, Luo, Fiona, Agashe, Adwait, Callison-Burch, Chris

arXiv.org Artificial IntelligenceOct-1-2024

Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.

query, rag, territory, (16 more...)

arXiv.org Artificial Intelligence

2410.01171

Country:

Europe > United Kingdom (0.28)
Africa > Middle East > Morocco (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
(27 more...)

Genre: Research Report > New Finding (0.54)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ships smuggling Russian oil spotted in satellite images by AI

New ScientistApr-23-2024, 15:00:13 GMT

Artificial intelligence can spot stealthy cargo transfers between "dark ships", vessels that have switched off their identification transponders. This could make it much easier to track the shadow fleet secretly transferring crude oil in defiance of international sanctions. Such ship-to-ship transfers at sea have skyrocketed since Russia's full-scale invasion of Ukraine spurred the European Union to ban the import of Russian crude oil.

artificial intelligence, satellite image, ship smuggling russian oil, (1 more...)

New Scientist

Country:

Europe > Ukraine (0.34)
Europe > Russia (0.34)
Asia > Russia (0.34)
Europe > Spain > Ceuta (0.14)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Analysis of tidal flows through the Strait of Gibraltar using Dynamic Mode Decomposition

Dias, Sathsara, Surasinghe, Sudam, Priyankara, Kanaththa, Budišić, Marko, Pratt, Larry, Sanchez-Garrido, José C., Bollt, Erik M.

arXiv.org Machine LearningNov-2-2023

The Strait of Gibraltar is a region characterized by intricate oceanic sub-mesoscale features, influenced by topography, tidal forces, instabilities, and nonlinear hydraulic processes, all governed by the nonlinear equations of fluid motion. In this study, we aim to uncover the underlying physics of these phenomena within 3D MIT general circulation model simulations, including waves, eddies, and gyres. To achieve this, we employ Dynamic Mode Decomposition (DMD) to break down simulation snapshots into Koopman modes, with distinct exponential growth/decay rates and oscillation frequencies. Our objectives encompass evaluating DMD's efficacy in capturing known features, unveiling new elements, ranking modes, and exploring order reduction. We also introduce modifications to enhance DMD's robustness, numerical accuracy, and robustness of eigenvalues. DMD analysis yields a comprehensive understanding of flow patterns, internal wave formation, and the dynamics of the Strait of Gibraltar, its meandering behaviors, and the formation of a secondary gyre, notably the Western Alboran Gyre, as well as the propagation of Kelvin and coastal-trapped waves along the African coast. In doing so, it significantly advances our comprehension of intricate oceanographic phenomena and underscores the immense utility of DMD as an analytical tool for such complex datasets, suggesting that DMD could serve as a valuable addition to the toolkit of oceanographers.

artificial intelligence, gibraltar, machine learning, (15 more...)

arXiv.org Machine Learning

2311.01377

Country:

Europe > Gibraltar (0.82)
Atlantic Ocean > Mediterranean Sea > Strait of Gibraltar (0.82)
Atlantic Ocean > Mediterranean Sea > Alboran Sea (0.04)
(12 more...)

Genre: Research Report > New Finding (0.66)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Prompting as Probing: Using Language Models for Knowledge Base Construction

Alivanistos, Dimitrios, Santamaría, Selene Báez, Cochez, Michael, Kalo, Jan-Christoph, van Krieken, Emile, Thanapalasingam, Thiviyan

arXiv.org Artificial IntelligenceJun-19-2023

Language Models (LMs) have proven to be useful in various downstream applications, such as summarisation, translation, question answering and text classification. LMs are becoming increasingly important tools in Artificial Intelligence, because of the vast quantity of information they can store. In this work, we present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020, to perform the task of Knowledge Base Construction (KBC). ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this. Our results show that manual prompt curation is essential, that the LM must be encouraged to give answer sets of variable lengths, in particular including empty answer sets, that true/false questions are a useful device to increase precision on suggestions generated by the LM, that the size of the LM is a crucial factor, and that a dictionary of entity aliases improves the LM score. Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions: ProP won track 2 of the LM-KBC competition, outperforming the baseline by 36.4 percentage points.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2208.11057

Country:

Europe > Spain > Castilla-La Mancha (0.14)
Africa > Eswatini (0.14)
Europe > Ukraine (0.04)
(73 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Media (1.00)
Automobiles & Trucks > Manufacturer (0.93)
Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Cacha, Ignacio Heredia, Díaz, Judith Sainz-Pardo, Melguizo, María Castrillo, García, Álvaro López

arXiv.org Artificial IntelligenceAug-12-2022

In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-023-33795-8

2207.05753

Country:

North America > Costa Rica > Heredia Province > Heredia (0.04)
Asia > India (0.04)
North America > Mexico (0.04)
(18 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

In post-pandemic Europe, migrants will face digital fortress

PBS NewsHourMay-31-2021, 23:03:04 GMT

As the world begins to travel again, Europe is sending migrants a loud message: Stay away! Greek border police are firing bursts of deafening noise from an armored truck over the frontier into Turkey. Mounted on the vehicle, the long-range acoustic device, or "sound cannon," is the size of a small TV set but can match the volume of a jet engine. It's part of a vast array of physical and experimental new digital barriers being installed and tested during the quiet months of the coronavirus pandemic at the 200-kilometer (125-mile) Greek border with Turkey to stop people entering the European Union illegally. Nearby observation towers are being fitted with long-range cameras, night vision, and multiple sensors.

border, europe, migrant, (11 more...)

PBS NewsHour

Country:

Europe > Greece (0.61)
Asia > Middle East > Republic of Türkiye (0.57)
Africa > Middle East > Morocco (0.15)
(8 more...)

Industry: Government > Regional Government > Europe Government (1.00)

Technology:

Information Technology > Security & Privacy (0.71)
Information Technology > Artificial Intelligence (0.50)

Add feedback