Goto

Collaborating Authors

 Tabik, Siham


On Large Uni- and Multi-modal Models for Unsupervised Classification of Social Media Images: Nature's Contribution to People as a case study

arXiv.org Artificial Intelligence

Social media images have proven to be a valuable source of information for understanding human interactions with important subjects such as cultural heritage, biodiversity, and nature, among others. The task of grouping such images into a number of semantically meaningful clusters without labels is challenging due to the high diversity and complex nature of the visual content in addition to their large volume. On the other hand, recent advances in Large Visual Models (LVMs), Large Language Models (LLMs), and Large Visual Language Models (LVLMs) provide an important opportunity to explore new productive and scalable solutions. This work proposes, analyzes, and compares various approaches based on one or more state-of-the-art LVM, LLM, and LVLM, for mapping social media images into a number of predefined classes. As a case study, we consider the problem of understanding the interactions between humans and nature, also known as Nature's Contribution to People or Cultural Ecosystem Services (CES). Our experiments show that the highest-performing approaches, with accuracy above 95%, still require the creation of a small labeled dataset. These include the fine-tuned LVM DINOv2 and the LVLM LLaVA-1.5 combined with a fine-tuned LLM. The top fully unsupervised approaches, achieving accuracy above 84%, are the LVLMs, specifically the proprietary GPT-4 model and the public LLaVA-1.5 model. Additionally, the LVM DINOv2, when applied in a 10-shot learning setup, delivered competitive results with an accuracy of 83.99%, closely matching the performance of the LVLM LLaVA-1.5.


What is the best RNN-cell structure to forecast each time series behavior?

arXiv.org Artificial Intelligence

It is unquestionable that time series forecasting is of paramount importance in many fields. The most used machine learning models to address time series forecasting tasks are Recurrent Neural Networks (RNNs). Typically, those models are built using one of the three most popular cells: ELMAN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU) cells. Each cell has a different structure and implies a different computational cost. However, it is not clear why and when to use each RNN-cell structure. Actually, there is no comprehensive characterization of all the possible time series behaviors and no guidance on what RNN cell structure is the most suitable for each behavior. The objective of this study is twofold: it presents a comprehensive taxonomy of almost all time series behaviors and provides insights into the best RNN cell structure for each time series behavior. We conducted two experiments: (1) We evaluate and analyze the role of each component in the LSTM-Vanilla cell by creating 11 variants based on one alteration in its basic architecture (removing, adding, or substituting one cell component). (2) We evaluate and analyze the performance of 20 possible RNN-cell structures. To evaluate, compare, and select the best model, different statistical metrics were used: error-based metrics, information criterion-based metrics, naive-based metrics, and direction change-based metrics. To further improve our confidence in the models interpretation and selection, the Friedman Wilcoxon-Holm signed-rank test was used. Our results advocate the usage and exploration of the newly created RNN variant, named SLIM, in time series forecasting thanks to its high ability to accurately predict the different time series behaviors, as well as its simple structural design that does not require expensive temporal and computing resources.


Shrub of a thousand faces: an individual segmentation from satellite images using deep learning

arXiv.org Artificial Intelligence

Monitoring the distribution and size structure of long-living shrubs, such as Juniperus communis, can be used to estimate the long-term effects of climate change on high-mountain and high latitude ecosystems. Historical aerial very-high resolution imagery offers a retrospective tool to monitor shrub growth and distribution at high precision. Currently, deep learning models provide impressive results for detecting and delineating the contour of objects with defined shapes. However, adapting these models to detect natural objects that express complex growth patterns, such as junipers, is still a challenging task. This research presents a novel approach that leverages remotely sensed RGB imagery in conjunction with Mask R-CNN-based instance segmentation models to individually delineate Juniperus shrubs above the treeline in Sierra Nevada (Spain). In this study, we propose a new data construction design that consists in using photo interpreted (PI) and field work (FW) data to respectively develop and externally validate the model. We also propose a new shrub-tailored evaluation algorithm based on a new metric called Multiple Intersections over Ground Truth Area (MIoGTA) to assess and optimize the model shrub delineation performance. Finally, we deploy the developed model for the first time to generate a wall-to-wall map of Juniperus individuals. The experimental results demonstrate the efficiency of our dual data construction approach in overcoming the limitations associated with traditional field survey methods. They also highlight the robustness of MIoGTA metric in evaluating instance segmentation models on species with complex growth patterns showing more resilience against data annotation uncertainty. Furthermore, they show the effectiveness of employing Mask R-CNN with ResNet101-C4 backbone in delineating PI and FW shrubs, achieving an F1-score of 87,87% and 76.86%, respectively.


Bidirectional recurrent imputation and abundance estimation of LULC classes with MODIS multispectral time series and geo-topographic and climatic data

arXiv.org Artificial Intelligence

Remotely sensed data are dominated by mixed Land Use and Land Cover (LULC) types. Spectral unmixing (SU) is a key technique that disentangles mixed pixels into constituent LULC types and their abundance fractions. While existing studies on Deep Learning (DL) for SU typically focus on single time-step hyperspectral (HS) or multispectral (MS) data, our work pioneers SU using MODIS MS time series, addressing missing data with end-to-end DL models. Our approach enhances a Long-Short Term Memory (LSTM)-based model by incorporating geographic, topographic (geo-topographic), and climatic ancillary information. Notably, our method eliminates the need for explicit endmember extraction, instead learning the input-output relationship between mixed spectra and LULC abundances through supervised learning. Experimental results demonstrate that integrating spectral-temporal input data with geo-topographic and climatic information significantly improves the estimation of LULC abundances in mixed pixels. To facilitate this study, we curated a novel labeled dataset for Andalusia (Spain) with monthly MODIS multispectral time series at 460m resolution for 2013. Named Andalusia MultiSpectral MultiTemporal Unmixing (Andalusia-MSMTU), this dataset provides pixel-level annotations of LULC abundances along with ancillary information. The dataset (https://zenodo.org/records/7752348) and code (https://github.com/jrodriguezortega/MSMTU) are available to the public.


EXplainable Neural-Symbolic Learning (X-NeSyL) methodology to fuse deep learning representations with expert knowledge graphs: the MonuMAI cultural heritage use case

arXiv.org Artificial Intelligence

The latest Deep Learning (DL) models for detection and classification have achieved an unprecedented performance over classical machine learning algorithms. However, DL models are black-box methods hard to debug, interpret, and certify. DL alone cannot provide explanations that can be validated by a non technical audience. In contrast, symbolic AI systems that convert concepts into rules or symbols -- such as knowledge graphs -- are easier to explain. However, they present lower generalisation and scaling capabilities. A very important challenge is to fuse DL representations with expert knowledge. One way to address this challenge, as well as the performance-explainability trade-off is by leveraging the best of both streams without obviating domain expert knowledge. We tackle such problem by considering the symbolic knowledge is expressed in form of a domain expert knowledge graph. We present the eXplainable Neural-symbolic learning (X-NeSyL) methodology, designed to learn both symbolic and deep representations, together with an explainability metric to assess the level of alignment of machine and human expert explanations. The ultimate objective is to fuse DL representations with expert domain knowledge during the learning process to serve as a sound basis for explainability. X-NeSyL methodology involves the concrete use of two notions of explanation at inference and training time respectively: 1) EXPLANet: Expert-aligned eXplainable Part-based cLAssifier NETwork Architecture, a compositional CNN that makes use of symbolic representations, and 2) SHAP-Backprop, an explainable AI-informed training procedure that guides the DL process to align with such symbolic representations in form of knowledge graphs. We showcase X-NeSyL methodology using MonuMAI dataset for monument facade image classification, and demonstrate that our approach improves explainability and performance.


Lights and Shadows in Evolutionary Deep Learning: Taxonomy, Critical Methodological Analysis, Cases of Study, Learned Lessons, Recommendations and Challenges

arXiv.org Artificial Intelligence

Much has been said about the fusion of bio-inspired optimization algorithms and Deep Learning models for several purposes: from the discovery of network topologies and hyper-parametric configurations with improved performance for a given task, to the optimization of the model's parameters as a replacement for gradient-based solvers. Indeed, the literature is rich in proposals showcasing the application of assorted nature-inspired approaches for these tasks. In this work we comprehensively review and critically examine contributions made so far based on three axes, each addressing a fundamental question in this research avenue: a) optimization and taxonomy (Why?), including a historical perspective, definitions of optimization problems in Deep Learning, and a taxonomy associated with an in-depth analysis of the literature, b) critical methodological analysis (How?), which together with two case studies, allows us to address learned lessons and recommendations for good practices following the analysis of the literature, and c) challenges and new directions of research (What can be done, and what for?). In summary, three axes - optimization and taxonomy, critical analysis, and challenges - which outline a complete vision of a merger of two technologies drawing up an exciting future for this area of fusion research.


Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

arXiv.org Artificial Intelligence

In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.


Deep Learning in Video Multi-Object Tracking: A Survey

arXiv.org Machine Learning

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.