Goto

Collaborating Authors

 Melilla


IberFire -- a detailed creation of a spatio-temporal dataset for wildfire risk assessment in Spain

Erzibengoa, Julen, Gómez-Omella, Meritxell, Goienetxea, Izaro

arXiv.org Artificial Intelligence

Wildfires pose a threat to ecosystems, economies and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models require high-resolution spatio-temporal data to capture complex dynamics of environmental and human factors. To address the scarcity of fine-grained wildfire datasets in Spain, we introduce IberFire: a spatio-temporal dataset with 1 km x 1 km x 1-day resolution, covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 120 features across eight categories: auxiliary data, fire history, geography, topography, meteorology, vegetation indices, human activity and land cover. All features and processing rely on open-access data and tools, with a publicly available codebase ensuring transparency and applicability. IberFire offers enhanced spatial granularity and feature diversity compared to existing European datasets, and provides a reproducible framework. It supports advanced wildfire risk modelling via Machine Learning and Deep Learning, facilitates climate trend analysis, and informs fire prevention and land management strategies. The dataset is freely available on Zenodo to promote open research and collaboration.


Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs

Lopez-Duran, Miguel, Fierrez, Julian, Morales, Aythami, Tolosana, Ruben, Delgado-Mohatar, Oscar, Ortigosa, Alvaro

arXiv.org Artificial Intelligence

The automatic analysis of document layouts in digital-born PDF documents remains a challenging problem due to the heterogeneous arrangement of textual and nontextual elements and the imprecision of the textual metadata in the Portable Document Format. In this work, we benchmark Graph Neural Network (GNN) architectures for the task of fine-grained layout classification of text blocks from digital native documents. We introduce two graph construction structures: a k-closest-neighbor graph and a fully connected graph, and generate node features via pre-trained text and vision models, thus avoiding manual feature engineering. Three experimental frameworks are evaluated: single-modality (text or visual), concatenated multimodal, and dual-branch multimodal. We evaluated four foundational GNN models and compared them with the baseline. Our experiments are specifically conducted on a rich dataset of public affairs documents that includes more than 20 sources (e.g., regional and national-level official gazettes), 37K PDF documents, with 441K pages in total. Our results demonstrate that GraphSAGE operating on the k-closest-neighbor graph in a dual-branch configuration achieves the highest per-class and overall accuracy, outperforming the baseline in some sources. These findings confirm the importance of local layout relationships and multimodal fusion exploited through GNNs for the analysis of native digital document layouts.


Context-Robust Knowledge Editing for Language Models

Park, Haewon, Choi, Gyubin, Kim, Minjun, Jo, Yohan

arXiv.org Artificial Intelligence

Knowledge editing (KE) methods offer an efficient way to modify knowledge in large language models. Current KE evaluations typically assess editing success by considering only the edited knowledge without any preceding contexts. In real-world applications, however, preceding contexts often trigger the retrieval of the original knowledge and undermine the intended edit. To address this issue, we develop CHED -- a benchmark designed to evaluate the context robustness of KE methods. Evaluations on CHED show that they often fail when preceding contexts are present. To mitigate this shortcoming, we introduce CoRE, a KE method designed to strengthen context robustness by minimizing context-sensitive variance in hidden states of the model for edited knowledge. This method not only improves the editing success rate in situations where a preceding context is present but also preserves the overall capabilities of the model. We provide an in-depth analysis of the differing impacts of preceding contexts when introduced as user utterances versus assistant responses, and we dissect attention-score patterns to assess how specific tokens influence editing success.


A Machine Learning Approach for Identifying Anatomical Biomarkers of Early Mild Cognitive Impairment

Ahmad, Alwani Liyana, Sanchez-Bornot, Jose, Sotero, Roberto C., Coyle, Damien, Idris, Zamzuri, Faye, Ibrahima

arXiv.org Artificial Intelligence

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that primarily affects the aging population by impairing cognitive and motor functions. Early detection of AD through accessible methodologies like magnetic resonance imaging (MRI) is vital for developing effective interventions to halt or slow the disease's progression. This study aims to perform a comprehensive analysis of machine learning techniques for selecting MRI-based biomarkers and classifying individuals into healthy controls (HC) and unstable controls (uHC) who later show mild cognitive impairment within five years. The research utilizes MRI data from the Alzheimer's Disease Neuroinformatics Initiative (ADNI) and the Open Access Series of Imaging Studies 3 (OASIS-3), focusing on both HC and uHC participants. The study addresses the challenges of imbalanced data by testing classification methods on balanced and unbalanced datasets, and harmonizes data using polynomial regression to mitigate nuisance variables like age, gender, and intracranial volume. Results indicate that Gaussian Naive Bayes and RusBoost classifiers shows an optimal performance, achieving accuracies of up to 76.46% and 72.48% respectively on the ADNI dataset. For the OASIS-3 dataset, Kernel Naive Bayes and RusBoost yield accuracies ranging from 64.66% to 75.71%, improving further in age-matched datasets. Brain regions like the entorhinal cortex, hippocampus, lateral ventricle, and lateral orbitofrontal cortex are identified as significantly impacted during early cognitive decline. Despite limitations such as small sample sizes, the study's harmonization approach enhances the robustness of biomarker selection, suggesting the potential of this semi-automatic machine learning pipeline for early AD detection using MRI.


Prompting as Probing: Using Language Models for Knowledge Base Construction

Alivanistos, Dimitrios, Santamaría, Selene Báez, Cochez, Michael, Kalo, Jan-Christoph, van Krieken, Emile, Thanapalasingam, Thiviyan

arXiv.org Artificial Intelligence

Language Models (LMs) have proven to be useful in various downstream applications, such as summarisation, translation, question answering and text classification. LMs are becoming increasingly important tools in Artificial Intelligence, because of the vast quantity of information they can store. In this work, we present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020, to perform the task of Knowledge Base Construction (KBC). ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this. Our results show that manual prompt curation is essential, that the LM must be encouraged to give answer sets of variable lengths, in particular including empty answer sets, that true/false questions are a useful device to increase precision on suggestions generated by the LM, that the size of the LM is a crucial factor, and that a dictionary of entity aliases improves the LM score. Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions: ProP won track 2 of the LM-KBC competition, outperforming the baseline by 36.4 percentage points.


Migration Reframed? A multilingual analysis on the stance shift in Europe during the Ukrainian crisis

Wildemann, Sergej, Niederée, Claudia, Elejalde, Erick

arXiv.org Artificial Intelligence

The war in Ukraine seems to have positively changed the attitude toward the critical societal topic of migration in Europe -- at least towards refugees from Ukraine. We investigate whether this impression is substantiated by how the topic is reflected in online news and social media, thus linking the representation of the issue on the Web to its perception in society. For this purpose, we combine and adapt leading-edge automatic text processing for a novel multilingual stance detection approach. Starting from 5.5M Twitter posts published by 565 European news outlets in one year, beginning September 2021, plus replies, we perform a multilingual analysis of migration-related media coverage and associated social media interaction for Europe and selected European countries. The results of our analysis show that there is actually a reframing of the discussion illustrated by the terminology change, e.g., from "migrant" to "refugee", often even accentuated with phrases such as "real refugees". However, concerning a stance shift in public perception, the picture is more diverse than expected. All analyzed cases show a noticeable temporal stance shift around the start of the war in Ukraine. Still, there are apparent national differences in the size and stability of this shift.


Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study

Cacha, Ignacio Heredia, Díaz, Judith Sainz-Pardo, Melguizo, María Castrillo, García, Álvaro López

arXiv.org Artificial Intelligence

In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.


Europe's migration crisis seen from orbit

#artificialintelligence

In images taken from a satellite floating 400 kilometers above the Earth, Europe's humanitarian crisis shows up as white pixels against the blue-green vastness of the Mediterranean. Captured by the sensors in space, small overcrowded boats with migrants leaving Africa headed north look like tiny white comets bursting through the ocean, leaving a tail where they stir waves. "It's not that with every image I look at, I think about how someone could be dying right now," said Elisabeth Wittmann as she clicked through satellite footage on her laptop showing the coast west of the Libyan port of Sabratha. "That's also to protect myself," she added. The 26-year-old computer scientist from southern Germany is one of a dozen researchers who have teamed up with a new NGO called Space-Eye to develop artificial intelligence technology that allows computers to detect migrant boats in satellite images.


Spanish police corner, gun down Barcelona van attacker

The Japan Times

SUBIRATS, SPAIN – Spanish police on Monday shot dead an Islamist militant who killed 13 people with a van in Barcelona last week, ending a five-day manhunt for the perpetrator of Spain's deadliest attack in over a decade. Police said they tracked 22-year-old Younes Abouyaaqoub to a rural area near Barcelona and shot him after he held up what looked like an explosives belt and shouted "Allahu Akbar" (God is Greatest). A bomb squad then used a robot to approach his body. Abouyaaqoub had been on the run since Thursday evening, after he drove at high speed into throngs of strollers along Barcelona's most famous avenue, Las Ramblas. After fleeing the scene, he hijacked a car and fatally stabbed its driver.