Not enough data to create a plot.
Try a different view from the menu above.
Andalusia
Red Teaming Contemporary AI Models: Insights from Spanish and Basque Perspectives
Romero-Arjona, Miguel, Valle, Pablo, Alonso, Juan C., Sánchez, Ana B., Ugarte, Miriam, Cazalilla, Antonia, Cambrón, Vicente, Parejo, José A., Arrieta, Aitor, Segura, Sergio
The battle for AI leadership is on, with OpenAI in the United States and DeepSeek in China as key contenders. In response to these global trends, the Spanish government has proposed ALIA, a public and transparent AI infrastructure incorporating small language models designed to support Spanish and co-official languages such as Basque. This paper presents the results of Red Teaming sessions, where ten participants applied their expertise and creativity to manually test three of the latest models from these initiatives$\unicode{x2013}$OpenAI o3-mini, DeepSeek R1, and ALIA Salamandra$\unicode{x2013}$focusing on biases and safety concerns. The results, based on 670 conversations, revealed vulnerabilities in all the models under test, with biased or unsafe responses ranging from 29.5% in o3-mini to 50.6% in Salamandra. These findings underscore the persistent challenges in developing reliable and trustworthy AI systems, particularly those intended to support Spanish and Basque languages.
Log Optimization Simplification Method for Predicting Remaining Time
Ye, Jianhong, Zhang, Siyuan, Lin, Yan
Information systems generate a large volume of event log data during business operations, much of which consists of low-value and redundant information. When performance predictions are made directly from these logs, the accuracy of the predictions can be compromised. Researchers have explored methods to simplify and compress these data while preserving their valuable components. Most existing approaches focus on reducing the dimensionality of the data by eliminating redundant and irrelevant features. However, there has been limited investigation into the efficiency of execution both before and after event log simplification. In this paper, we present a prediction point selection algorithm designed to avoid the simplification of all points that function similarly. We select sequences or self-loop structures to form a simplifiable segment, and we optimize the deviation between the actual simplifiable value and the original data prediction value to prevent over-simplification. Experiments indicate that the simplified event log retains its predictive performance and, in some cases, enhances its predictive accuracy compared to the original event log.
Conditional Diffusion-Flow models for generating 3D cosmic density fields: applications to f(R) cosmologies
Riveros, Julieth Katherine, Saavedra, Paola, Hortua, Hector J., Garcia-Farieta, Jorge Enrique, Olier, Ivan
Next-generation galaxy surveys promise unprecedented precision in testing gravity at cosmological scales. However, realising this potential requires accurately modelling the non-linear cosmic web. We address this challenge by exploring conditional generative modelling to create 3D dark matter density fields via score-based (diffusion) and flow-based methods. Our results demonstrate the power of diffusion models to accurately reproduce the matter power spectra and bispectra, even for unseen configurations. They also offer a significant speed-up with slightly reduced accuracy, when flow-based reconstructing the probability distribution function, but they struggle with higher-order statistics. To improve conditional generation, we introduce a novel multi-output model to develop feature representations of the cosmological parameters. Our findings offer a powerful tool for exploring deviations from standard gravity, combining high precision with reduced computational cost, thus paving the way for more comprehensive and efficient cosmological analyses null .
Classification of HI Galaxy Profiles Using Unsupervised Learning and Convolutional Neural Networks: A Comparative Analysis and Methodological Cases of Studies
Jaimes-Illanes, Gabriel, Parra-Royon, Manuel, Darriba-Pol, Laura, Moldón, Javier, Sorgho, Amidou, Sánchez-Expósito, Susana, Garrido-Sánchez, Julián, Verdes-Montenegro, Lourdes
Hydrogen, the most abundant element in the universe, is crucial for understanding galaxy formation and evolution. The 21 cm neutral atomic hydrogen - HI spectral line maps the gas kinematics within galaxies, providing key insights into interactions, galactic structure, and star formation processes. With new radio instruments, the volume and complexity of data is increasing. To analyze and classify integrated HI spectral profiles in a efficient way, this work presents a framework that integrates Machine Learning techniques, combining unsupervised methods and CNNs. To this end, we apply our framework to a selected subsample of 318 spectral HI profiles of the CIG and 30.780 profiles from the Arecibo Legacy Fast ALFA Survey catalogue. Data pre-processing involved the Busyfit package and iterative fitting with polynomial, Gaussian, and double-Lorentzian models. Clustering methods, including K-means, spectral clustering, DBSCAN, and agglomerative clustering, were used for feature extraction and to bootstrap classification we applied K-NN, SVM, and Random Forest classifiers, optimizing accuracy with CNN. Additionally, we introduced a 2D model of the profiles to enhance classification by adding dimensionality to the data. Three 2D models were generated based on transformations and normalised versions to quantify the level of asymmetry. These methods were tested in a previous analytical classification study conducted by the Analysis of the Interstellar Medium in Isolated Galaxies group. This approach enhances classification accuracy and aims to establish a methodology that could be applied to data analysis in future surveys conducted with the Square Kilometre Array (SKA), currently under construction. All materials, code, and models have been made publicly available in an open-access repository, adhering to FAIR principles.
Parallel multi-objective metaheuristics for smart communications in vehicular networks
VANETs improve the safety and efficiency of the road traffic through powerful cooperative applications that gather and broadcast real-time road traffic information. Routing in VANETs is a critical issue in today's research due to the high speed of the nodes, rate of topology variability, and real-time restrictions of their applications. Hence, the research community is very active with hot topics, creating new VANET protocols and improving the existent ones (Lee et al. 2009). The Ad hoc On Demand Vector (AODV) routing proto-col (Perkins et al. 2003), which is optimized in this study, has been previously analyzed for use in vehicular environments. Some authors have proposed changes to its parameter configuration to gain huge improvements over its quality-of-service (QoS) in VANETs (Said and Nakamura 2014). The configuration parameters of AODV have a strongly non-linear relationship with each other and a complex influence on its final performance. In fact, they represent a mix of discrete plus continuous variables which makes it a hard challenge to find the "best" configuration in a real-world scenario. Thus, exact and enumerative methods are not applicable for solving the underlying optimization problem of finding the "best" AODV configuration, because they require critically long execution times to perform the search, and because we are far from having a traditional analytical equation. In this context, soft computing methods are a promising approach to find accurate QoS-efficient AODV configurations in rea-sonable times.
Automatic tuning of communication protocols for vehicular ad hoc networks using metaheuristics
García-Nieto, José, Toutouh, Jamal, Alba, Enrique
The emerging field of vehicular ad hoc networks (VANETs) deals with a set of communicating vehicles which are able to spontaneously interconnect without any pre-existing infrastructure. In such kind of networks, it is crucial to make an optimal configuration of the communication protocols previously to the final network deployment. This way, a human designer can obtain an optimal QoS of the network beforehand. The problem we consider in this work lies in configuring the File Transfer protocol Configuration (FTC) with the aim of optimizing the transmission time, the number of lost packets, and the amount of data transferred in realistic VANET scenarios. We face the FTC with five representative state-of-the-art optimization techniques and compare their performance. These algorithms are: Particle Swarm Optimization (PSO), Differential Evolution (DE), Genetic Algorithm (GA), Evolutionary Strategy (ES), and Simulated Annealing (SA). For our tests, two typical environment instances of VANETs for Urban and Highway scenarios have been defined. The experiments using ns- 2 (a well-known realistic VANET simulator) reveal that PSO outperforms all the compared algorithms for both studied VANET instances.
Cooperative Patrol Routing: Optimizing Urban Crime Surveillance through Multi-Agent Reinforcement Learning
Palma-Borda, Juan, Guzmán, Eduardo, Belmonte, María-Victoria
The effective design of patrol strategies is a difficult and complex problem, especially in medium and large areas. The objective is to plan, in a coordinated manner, the optimal routes for a set of patrols in a given area, in order to achieve maximum coverage of the area, while also trying to minimize the number of patrols. In this paper, we propose a multi-agent reinforcement learning (MARL) model, based on a decentralized partially observable Markov decision process, to plan unpredictable patrol routes within an urban environment represented as an undirected graph. The model attempts to maximize a target function that characterizes the environment within a given time frame. Our model has been tested to optimize police patrol routes in three medium-sized districts of the city of Malaga. The aim was to maximize surveillance coverage of the most crime-prone areas, based on actual crime data in the city. To address this problem, several MARL algorithms have been studied, and among these the Value Decomposition Proximal Policy Optimization (VDPPO) algorithm exhibited the best performance. We also introduce a novel metric, the coverage index, for the evaluation of the coverage performance of the routes generated by our model. This metric is inspired by the predictive accuracy index (PAI), which is commonly used in criminology to detect hotspots. Using this metric, we have evaluated the model under various scenarios in which the number of agents (or patrols), their starting positions, and the level of information they can observe in the environment have been modified. Results show that the coordinated routes generated by our model achieve a coverage of more than $90\%$ of the $3\%$ of graph nodes with the highest crime incidence, and $65\%$ for $20\%$ of these nodes; $3\%$ and $20\%$ represent the coverage standards for police resource allocation.
Unveiling the Potential of Text in High-Dimensional Time Series Forecasting
Zhou, Xin, Wang, Weiqing, Qu, Shilin, Zhang, Zhiqiang, Bergmeir, Christoph
Time series forecasting has traditionally focused on univariate and multivariate numerical data, often overlooking the benefits of incorporating multimodal information, particularly textual data. In this paper, we propose a novel framework that integrates time series models with Large Language Models to improve high-dimensional time series forecasting. Inspired by multimodal models, our method combines time series and textual data in the dual-tower structure. This fusion of information creates a comprehensive representation, which is then processed through a linear layer to generate the final forecast. Extensive experiments demonstrate that incorporating text enhances high-dimensional time series forecasting performance. This work paves the way for further research in multimodal time series forecasting.
A Digital Shadow for Modeling, Studying and Preventing Urban Crime
Palma-Borda, Juan, Guzmán, Eduardo, Belmonte, María-Victoria
Crime is one of the greatest threats to urban security. Around 80 percent of the world's population lives in countries with high levels of criminality. Most of the crimes committed in the cities take place in their urban environments. This paper presents the development and validation of a digital shadow platform for modeling and simulating urban crime. This digital shadow has been constructed using data-driven agent-based modeling and simulation techniques, which are suitable for capturing dynamic interactions among individuals and with their environment. Our approach transforms and integrates well-known criminological theories and the expert knowledge of law enforcement agencies (LEA), policy makers, and other stakeholders under a theoretical model, which is in turn combined with real crime, spatial (cartographic) and socio-economic data into an urban model characterizing the daily behavior of citizens. The digital shadow has also been instantiated for the city of Malaga, for which we had over 300,000 complaints available. This instance has been calibrated with those complaints and other geographic and socio-economic information of the city. To the best of our knowledge, our digital shadow is the first for large urban areas that has been calibrated with a large dataset of real crime reports and with an accurate representation of the urban environment. The performance indicators of the model after being calibrated, in terms of the metrics widely used in predictive policing, suggest that our simulated crime generation matches the general pattern of crime in the city according to historical data. Our digital shadow platform could be an interesting tool for modeling and predicting criminal behavior in an urban environment on a daily basis and, thus, a useful tool for policy makers, criminologists, sociologists, LEAs, etc. to study and prevent urban crime.
Splitting criteria for ordinal decision trees: an experimental study
Ayllón-Gavilán, Rafael, Martínez-Estudillo, Francisco José, Guijo-Rubio, David, Hervás-Martínez, César, Gutiérrez, Pedro Antonio
Ordinal Classification (OC) is a machine learning field that addresses classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as equally distinct, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has implications. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are one of the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work conducts an experimental study of tree-based methodologies specifically designed to capture ordinal relationships. A comprehensive survey of ordinal splitting criteria is provided, standardising the notations used in the literature for clarity. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering 45 publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. Statistical analysis of the results highlights OGini as the most effective ordinal splitting criterion to date. Source code, datasets, and results are made available to the research community.