Goto

Collaborating Authors

 Matteucci, Matteo


Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation

arXiv.org Artificial Intelligence

As machine learning models increase in scale and complexity, obtaining sufficient training data has become a critical bottleneck due to acquisition costs, privacy constraints, and data scarcity in specialised domains. While synthetic data generation has emerged as a promising alternative, a notable performance gap remains compared to models trained on real data, particularly as task complexity grows. Concurrently, Neuro-Symbolic methods, which combine neural networks' learning strengths with symbolic reasoning's structured representations, have demonstrated significant potential across various cognitive tasks. This paper explores the utility of Neuro-Symbolic conditioning for synthetic image dataset generation, focusing specifically on improving the performance of Scene Graph Generation models. The research investigates whether structured symbolic representations in the form of scene graphs can enhance synthetic data quality through explicit encoding of relational constraints. The results demonstrate that Neuro-Symbolic conditioning yields significant improvements of up to +2.59% in standard Recall metrics and +2.83% in No Graph Constraint Recall metrics when used for dataset augmentation. These findings establish that merging Neuro-Symbolic and generative approaches produces synthetic data with complementary structural information that enhances model performance when combined with real data, providing a novel approach to overcome data scarcity limitations even for complex visual reasoning tasks.


ZO-DARTS++: An Efficient and Size-Variable Zeroth-Order Neural Architecture Search Algorithm

arXiv.org Artificial Intelligence

Differentiable Neural Architecture Search (NAS) provides a promising avenue for automating the complex design of deep learning (DL) models. However, current differentiable NAS methods often face constraints in efficiency, operation selection, and adaptability under varying resource limitations. We introduce ZO-DARTS++, a novel NAS method that effectively balances performance and resource constraints. By integrating a zeroth-order approximation for efficient gradient handling, employing a sparsemax function with temperature annealing for clearer and more interpretable architecture distributions, and adopting a size-variable search scheme for generating compact yet accurate architectures, ZO-DARTS++ establishes a new balance between model complexity and performance. In extensive tests on medical imaging datasets, ZO-DARTS++ improves the average accuracy by up to 1.8\% over standard DARTS-based methods and shortens search time by approximately 38.6\%. Additionally, its resource-constrained variants can reduce the number of parameters by more than 35\% while maintaining competitive accuracy levels. Thus, ZO-DARTS++ offers a versatile and efficient framework for generating high-quality, resource-aware DL models suitable for real-world medical applications.


A Spatio-temporal Graph Network Allowing Incomplete Trajectory Input for Pedestrian Trajectory Prediction

arXiv.org Artificial Intelligence

Pedestrian trajectory prediction is important in the research of mobile robot navigation in environments with pedestrians. Most pedestrian trajectory prediction algorithms require the input historical trajectories to be complete. If a pedestrian is unobservable in any frame in the past, then its historical trajectory become incomplete, the algorithm will not predict its future trajectory. To address this limitation, we propose the STGN-IT, a spatio-temporal graph network allowing incomplete trajectory input, which can predict the future trajectories of pedestrians with incomplete historical trajectories. STGN-IT uses the spatio-temporal graph with an additional encoding method to represent the historical trajectories and observation states of pedestrians. Moreover, STGN-IT introduces static obstacles in the environment that may affect the future trajectories as nodes to further improve the prediction accuracy. A clustering algorithm is also applied in the construction of spatio-temporal graphs. Experiments on public datasets show that STGN-IT outperforms state of the art algorithms on these metrics.


POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator

arXiv.org Artificial Intelligence

Neural Architecture Search (NAS) automates neural network design, reducing dependence on human expertise. While NAS methods are computationally intensive and dataset-specific, auxiliary predictors reduce the models needing training, decreasing search time. This strategy is used to generate architectures satisfying multiple computational constraints. Recently, Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent. In this field, DiffusionNAG is a state-of-the-art method. This diffusion-based approach streamlines computation, generating architectures optimized for accuracy on unseen datasets without further adaptation. However, by focusing solely on accuracy, DiffusionNAG overlooks other crucial objectives like model complexity, computational efficiency, and inference latency -- factors essential for deploying models in resource-constrained environments. This paper introduces the Pareto-Optimal Many-Objective Neural Architecture Generator (POMONAG), extending DiffusionNAG via a many-objective diffusion process. POMONAG simultaneously considers accuracy, number of parameters, multiply-accumulate operations (MACs), and inference latency. It integrates Performance Predictor models to estimate these metrics and guide diffusion gradients. POMONAG's optimization is enhanced by expanding its training Meta-Dataset, applying Pareto Front Filtering, and refining embeddings for conditional generation. These enhancements enable POMONAG to generate Pareto-optimal architectures that outperform the previous state-of-the-art in performance and efficiency. Results were validated on two search spaces -- NASBench201 and MobileNetV3 -- and evaluated across 15 image classification datasets.


FPBoost: Fully Parametric Gradient Boosting for Survival Analysis

arXiv.org Artificial Intelligence

Survival analysis is a critical tool for analyzing time-to-event data and extracting valuable clinical insights. Recently, numerous machine learning techniques leveraging neural networks and decision trees have been developed for this task. Among these, the most successful approaches often rely on specific assumptions about the shape of the modeled hazard function. These assumptions include proportional hazard, accelerated failure time, or discrete estimation at a predefined set of time points. In this study, we propose a novel paradigm for survival model design based on the weighted sum of individual fully parametric hazard contributions. We build upon well-known ensemble techniques to deliver a novel contribution to the field by applying additive hazard functions, improving over approaches based on survival or cumulative hazard functions. Furthermore, the proposed model, which we call FPBoost, is the first algorithm to directly optimize the survival likelihood via gradient boosting. We evaluated our approach across a diverse set of datasets, comparing it against a variety of state-of-the-art models. The results demonstrate that FPBoost improves risk estimation, according to both concordance and calibration metrics.


Can CLIP help CLIP in learning 3D?

arXiv.org Artificial Intelligence

In this study, we explore an alternative approach to enhance contrastive text-image-3D alignment in the absence of textual descriptions for 3D objects. We introduce two unsupervised methods, $I2I$ and $(I2L)^2$, which leverage CLIP knowledge about textual and 2D data to compute the neural perceived similarity between two 3D samples. We employ the proposed methods to mine 3D hard negatives, establishing a multimodal contrastive pipeline with hard negative weighting via a custom loss function. We train on different configurations of the proposed hard negative mining approach, and we evaluate the accuracy of our models in 3D classification and on the cross-modal retrieval benchmark, testing image-to-shape and shape-to-image retrieval. Results demonstrate that our approach, even without explicit text alignment, achieves comparable or superior performance on zero-shot and standard 3D classification, while significantly improving both image-to-shape and shape-to-image retrieval compared to previous methods.


The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

arXiv.org Artificial Intelligence

As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents. The field of robotics has witnessed significant growth and has the potential to take a crucial role in various aspects of human life. An example of their potential is the domain of assistance robotics, where robots are designed to aid individuals in their daily activities, especially within home environments.


A Lightweight Neural Architecture Search Model for Medical Image Classification

arXiv.org Artificial Intelligence

Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.


Stable Diffusion Dataset Generation for Downstream Classification Tasks

arXiv.org Artificial Intelligence

Recent advances in generative artificial intelligence have enabled the creation of high-quality synthetic data that closely mimics real-world data. This paper explores the adaptation of the Stable Diffusion 2.0 model for generating synthetic datasets, using Transfer Learning, Fine-Tuning and generation parameter optimisation techniques to improve the utility of the dataset for downstream classification tasks. We present a class-conditional version of the model that exploits a Class-Encoder and optimisation of key generation parameters. Our methodology led to synthetic datasets that, in a third of cases, produced models that outperformed those trained on real datasets.


Latent Neural Cellular Automata for Resource-Efficient Image Restoration

arXiv.org Artificial Intelligence

Neural cellular automata represent an evolution of the traditional cellular automata model, enhanced by the integration of a deep learning-based transition function. This shift from a manual to a data-driven approach significantly increases the adaptability of these models, enabling their application in diverse domains, including content generation and artificial life. However, their widespread application has been hampered by significant computational requirements. In this work, we introduce the Latent Neural Cellular Automata (LNCA) model, a novel architecture designed to address the resource limitations of neural cellular automata. Our approach shifts the computation from the conventional input space to a specially designed latent space, relying on a pre-trained autoencoder. We apply our model in the context of image restoration, which aims to reconstruct high-quality images from their degraded versions. This modification not only reduces the model's resource consumption but also maintains a flexible framework suitable for various applications. Our model achieves a significant reduction in computational requirements while maintaining high reconstruction fidelity. This increase in efficiency allows for inputs up to 16 times larger than current state-of-the-art neural cellular automata models, using the same resources.