Ceuta
IberFire -- a detailed creation of a spatio-temporal dataset for wildfire risk assessment in Spain
Erzibengoa, Julen, Gómez-Omella, Meritxell, Goienetxea, Izaro
Wildfires pose a threat to ecosystems, economies and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models require high-resolution spatio-temporal data to capture complex dynamics of environmental and human factors. To address the scarcity of fine-grained wildfire datasets in Spain, we introduce IberFire: a spatio-temporal dataset with 1 km x 1 km x 1-day resolution, covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 120 features across eight categories: auxiliary data, fire history, geography, topography, meteorology, vegetation indices, human activity and land cover. All features and processing rely on open-access data and tools, with a publicly available codebase ensuring transparency and applicability. IberFire offers enhanced spatial granularity and feature diversity compared to existing European datasets, and provides a reproducible framework. It supports advanced wildfire risk modelling via Machine Learning and Deep Learning, facilitates climate trend analysis, and informs fire prevention and land management strategies. The dataset is freely available on Zenodo to promote open research and collaboration.
- Europe > Spain > Balearic Islands (0.24)
- Europe > Spain > Melilla (0.04)
- Europe > Spain > Ceuta (0.04)
- (9 more...)
- Government (0.68)
- Law Enforcement & Public Safety (0.49)
- Food & Agriculture > Agriculture (0.48)
- (2 more...)
MEL: Legal Spanish Language Model
Sánchez, David Betancur, García, Nuria Aldama, Jiménez, Álvaro Barbero, Nieto, Marta Guerrero, Morales, Patricia Marsà, Salas, Nicolás Serrano, Hernán, Carlos García, Coll, Pablo Haya, Ponsoda, Elena Montiel, Ibáñez, Pablo Calleja
Legal texts, characterized by complex and specialized terminology, present a significant challenge for Language Models. Adding an underrepresented language, such as Spanish, to the mix makes it even more challenging. While pre-trained models like XLM-RoBERTa have shown capabilities in handling multilingual corpora, their performance on domain specific documents remains underexplored. This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large, fine-tuned on legal documents such as BOE (Bolet\'in Oficial del Estado, the Spanish oficial report of laws) and congress texts. We detail the data collection, processing, training, and evaluation processes. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language. We also present case studies demonstrating the model's application to new legal texts, highlighting its potential to perform top results over different NLP tasks.
- Law (1.00)
- Government > Regional Government (0.46)
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation
Li, Bryan, Haider, Samar, Luo, Fiona, Agashe, Adwait, Callison-Burch, Chris
Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.
- Europe > United Kingdom (0.28)
- Africa > Middle East > Morocco (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- (27 more...)
Ships smuggling Russian oil spotted in satellite images by AI
Artificial intelligence can spot stealthy cargo transfers between "dark ships", vessels that have switched off their identification transponders. This could make it much easier to track the shadow fleet secretly transferring crude oil in defiance of international sanctions. Such ship-to-ship transfers at sea have skyrocketed since Russia's full-scale invasion of Ukraine spurred the European Union to ban the import of Russian crude oil.
Analysis of tidal flows through the Strait of Gibraltar using Dynamic Mode Decomposition
Dias, Sathsara, Surasinghe, Sudam, Priyankara, Kanaththa, Budišić, Marko, Pratt, Larry, Sanchez-Garrido, José C., Bollt, Erik M.
The Strait of Gibraltar is a region characterized by intricate oceanic sub-mesoscale features, influenced by topography, tidal forces, instabilities, and nonlinear hydraulic processes, all governed by the nonlinear equations of fluid motion. In this study, we aim to uncover the underlying physics of these phenomena within 3D MIT general circulation model simulations, including waves, eddies, and gyres. To achieve this, we employ Dynamic Mode Decomposition (DMD) to break down simulation snapshots into Koopman modes, with distinct exponential growth/decay rates and oscillation frequencies. Our objectives encompass evaluating DMD's efficacy in capturing known features, unveiling new elements, ranking modes, and exploring order reduction. We also introduce modifications to enhance DMD's robustness, numerical accuracy, and robustness of eigenvalues. DMD analysis yields a comprehensive understanding of flow patterns, internal wave formation, and the dynamics of the Strait of Gibraltar, its meandering behaviors, and the formation of a secondary gyre, notably the Western Alboran Gyre, as well as the propagation of Kelvin and coastal-trapped waves along the African coast. In doing so, it significantly advances our comprehension of intricate oceanographic phenomena and underscores the immense utility of DMD as an analytical tool for such complex datasets, suggesting that DMD could serve as a valuable addition to the toolkit of oceanographers.
- Europe > Gibraltar (0.82)
- Atlantic Ocean > Mediterranean Sea > Strait of Gibraltar (0.82)
- Atlantic Ocean > Mediterranean Sea > Alboran Sea (0.04)
- (12 more...)
Prompting as Probing: Using Language Models for Knowledge Base Construction
Alivanistos, Dimitrios, Santamaría, Selene Báez, Cochez, Michael, Kalo, Jan-Christoph, van Krieken, Emile, Thanapalasingam, Thiviyan
Language Models (LMs) have proven to be useful in various downstream applications, such as summarisation, translation, question answering and text classification. LMs are becoming increasingly important tools in Artificial Intelligence, because of the vast quantity of information they can store. In this work, we present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020, to perform the task of Knowledge Base Construction (KBC). ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this. Our results show that manual prompt curation is essential, that the LM must be encouraged to give answer sets of variable lengths, in particular including empty answer sets, that true/false questions are a useful device to increase precision on suggestions generated by the LM, that the size of the LM is a crucial factor, and that a dictionary of entity aliases improves the LM score. Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions: ProP won track 2 of the LM-KBC competition, outperforming the baseline by 36.4 percentage points.
- Europe > Spain > Castilla-La Mancha (0.14)
- Africa > Eswatini (0.14)
- Europe > Ukraine (0.04)
- (73 more...)
- Media (1.00)
- Automobiles & Trucks > Manufacturer (0.93)
- Government (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study
Cacha, Ignacio Heredia, Díaz, Judith Sainz-Pardo, Melguizo, María Castrillo, García, Álvaro López
In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- Asia > India (0.04)
- North America > Mexico (0.04)
- (18 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
- Health & Medicine > Therapeutic Area > Vaccines (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
Cell Grid Architecture for Maritime Route Prediction on AIS Data Streams
Amariei, Ciprian, Diac, Paul, Onica, Emanuel, Roşca, Valentin
The 2018 Grand Challenge targets the problem of accurate predictions on data streams produced by automatic identification system (AIS) equipment, describing naval traffic. This paper reports the technical details of a custom solution, which exposes multiple tuning parameters, making its configurability one of the main strengths. Our solution employs a cell grid architecture essentially based on a sequence of hash tables, specifically built for the targeted use case. This makes it particularly effective in prediction on AIS data, obtaining a high accuracy and scalable performance results. Moreover, the architecture proposed accommodates also an optionally semi-supervised learning process besides the basic supervised mode.
- Oceania > New Zealand > North Island > Waikato > Hamilton (0.06)
- Europe > Romania > Nord-Est Development Region > Iași County > Iași (0.06)
- Atlantic Ocean > Mediterranean Sea (0.05)
- (2 more...)