Goto

Collaborating Authors

 Iacca, Giovanni


Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

arXiv.org Artificial Intelligence

An old-school recipe for training a classifier is to (i) learn a good feature extractor and (ii) optimize a linear layer atop. When only a handful of samples are available per category, as in Few-Shot Adaptation (FSA), data are insufficient to fit a large number of parameters, rendering the above impractical. This is especially true with large pre-trained Vision-Language Models (VLMs), which motivated successful research at the intersection of Parameter-Efficient Fine-tuning (PEFT) and FSA. In this work, we start by analyzing the learning dynamics of PEFT techniques when trained on few-shot data from only a subset of categories, referred to as the ``base'' classes. We show that such dynamics naturally splits into two distinct phases: (i) task-level feature extraction and (ii) specialization to the available concepts. To accommodate this dynamic, we then depart from prompt- or adapter-based methods and tackle FSA differently. Specifically, given a fixed computational budget, we split it to (i) learn a task-specific feature extractor via PEFT and (ii) train a linear classifier on top. We call this scheme Two-Stage Few-Shot Adaptation (2SFS). Differently from established methods, our scheme enables a novel form of selective inference at a category level, i.e., at test time, only novel categories are embedded by the adapted text encoder, while embeddings of base categories are available within the classifier. Results with fixed hyperparameters across two settings, three backbones, and eleven datasets, show that 2SFS matches or surpasses the state-of-the-art, while established methods degrade significantly across settings.


2SSP: A Two-Stage Framework for Structured Pruning of LLMs

arXiv.org Artificial Intelligence

We propose a novel Two-Stage framework for Structured Pruning (2SSP) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning) removes entire neurons, hence their corresponding rows and columns, aiming to preserve the connectivity among the pruned structures in the intermediate state of the Feed-Forward Networks in each Transformer block. This is done based on an importance score measuring the impact of each neuron over the output magnitude. The second stage (Depth Pruning), instead, removes entire Attention submodules. This is done by applying an iterative process that removes the Attention submodules with the minimum impact on a given metric of interest (in our case, perplexity). We also propose a novel mechanism to balance the sparsity rate of the two stages w.r.t. to the desired global sparsity. We test 2SSP on four LLM families and three sparsity rates (25\%, 37.5\%, and 50\%), measuring the resulting perplexity over three language modeling datasets as well as the performance over six downstream tasks. Our method consistently outperforms five state-of-the-art competitors over three language modeling and six downstream tasks, with an up to two-order-of-magnitude gain in terms of pruning time. The code is available at available at \url{https://github.com/FabrizioSandri/2SSP}.


Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

arXiv.org Artificial Intelligence

Network pruning focuses on computational techniques that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has been pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are in any case too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their activations, to obtain sparse models that maximize the activations' alignment w.r.t. their corresponding dense models. Hence, we propose \textsc{NeuroAL}, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, which modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version to maximize the \emph{neuron alignment} among activations. Differently from existing methods, our approach adaptively selects the best hyperparameters for the block-wise and row-wise sparsity ratios w.r.t. the model and the desired sparsity, and requires \emph{no re-training}. We test our method over 276 cases combining four LLM families, three sparsity ratios, and ten language tasks (three language modeling and seven zero-shot datasets), showing how it consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off. The code is available at \href{https://github.com/eliacunegatti/NeuroAL}{https://github.com/eliacunegatti/NeuroAL}.


SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks

arXiv.org Artificial Intelligence

Continuous control tasks often involve high-dimensional, dynamic, and non-linear environments. State-of-the-art performance in these tasks is achieved through complex closed-box policies that are effective, but suffer from an inherent opacity. Interpretable policies, while generally underperforming compared to their closed-box counterparts, advantageously facilitate transparent decision-making within automated systems. Hence, their usage is often essential for diagnosing and mitigating errors, supporting ethical and legal accountability, and fostering trust among stakeholders. In this paper, we propose SMOSE, a novel method to train sparsely activated interpretable controllers, based on a top-1 Mixture-of-Experts architecture. SMOSE combines a set of interpretable decisionmakers, trained to be experts in different basic skills, and an interpretable router that assigns tasks among the experts. The training is carried out via state-of-the-art Reinforcement Learning algorithms, exploiting load-balancing techniques to ensure fair expert usage. We then distill decision trees from the weights of the router, significantly improving the ease of interpretation. We evaluate SMOSE on six benchmark environments from MuJoCo: our method outperforms recent interpretable baselines and narrows the gap with noninterpretable state-of-the-art algorithms


Frustratingly Easy Test-Time Adaptation of Vision-Language Models

arXiv.org Artificial Intelligence

Vision-Language Models seamlessly discriminate among arbitrary semantic categories, yet they still suffer from poor generalization when presented with challenging examples. For this reason, Episodic Test-Time Adaptation (TTA) strategies have recently emerged as powerful techniques to adapt VLMs in the presence of a single unlabeled image. The recent literature on TTA is dominated by the paradigm of prompt tuning by Marginal Entropy Minimization, which, relying on online backpropagation, inevitably slows down inference while increasing memory. In this work, we theoretically investigate the properties of this approach and unveil that a surprisingly strong TTA method lies dormant and hidden within it. We term this approach ZERO (TTA with "zero" temperature), whose design is both incredibly effective and frustratingly simple: augment N times, predict, retain the most confident predictions, and marginalize after setting the Softmax temperature to zero. Remarkably, ZERO requires a single batched forward pass through the vision encoder only and no backward passes. We thoroughly evaluate our approach following the experimental protocol established in the literature and show that ZERO largely surpasses or compares favorably w.r.t. the state-of-the-art while being almost 10x faster and 13x more memory-friendly than standard Test-Time Prompt Tuning. Thanks to its simplicity and comparatively negligible computation, ZERO can serve as a strong baseline for future work in this field. The code is available at https://github.com/FarinaMatteo/zero.


Neuron-centric Hebbian Learning

arXiv.org Artificial Intelligence

One of the most striking capabilities behind the learning mechanisms of the brain is the adaptation, through structural and functional plasticity, of its synapses. While synapses have the fundamental role of transmitting information across the brain, several studies show that it is the neuron activations that produce changes on synapses. Yet, most plasticity models devised for artificial Neural Networks (NNs), e.g., the ABCD rule, focus on synapses, rather than neurons, therefore optimizing synaptic-specific Hebbian parameters. This approach, however, increases the complexity of the optimization process since each synapse is associated to multiple Hebbian parameters. To overcome this limitation, we propose a novel plasticity model, called Neuron-centric Hebbian Learning (NcHL), where optimization focuses on neuron- rather than synaptic-specific Hebbian parameters. Compared to the ABCD rule, NcHL reduces the parameters from $5W$ to $5N$, being $W$ and $N$ the number of weights and neurons, and usually $N \ll W$. We also devise a ``weightless'' NcHL model, which requires less memory by approximating the weights based on a record of neuron activations. Our experiments on two robotic locomotion tasks reveal that NcHL performs comparably to the ABCD rule, despite using up to $\sim97$ times less parameters, thus allowing for scalable plasticity


Many-Objective Evolutionary Influence Maximization: Balancing Spread, Budget, Fairness, and Time

arXiv.org Artificial Intelligence

The Influence Maximization (IM) problem seeks to discover the set of nodes in a graph that can spread the information propagation at most. This problem is known to be NP-hard, and it is usually studied by maximizing the influence (spread) and, optionally, optimizing a second objective, such as minimizing the seed set size or maximizing the influence fairness. However, in many practical scenarios multiple aspects of the IM problem must be optimized at the same time. In this work, we propose a first case study where several IM-specific objective functions, namely budget, fairness, communities, and time, are optimized on top of the maximization of influence and minimization of the seed set size. To this aim, we introduce MOEIM (Many-Objective Evolutionary Algorithm for Influence Maximization) a Multi-Objective Evolutionary Algorithm (MOEA) based on NSGA-II incorporating graph-aware operators and a smart initialization. We compare MOEIM in two experimental settings, including a total of nine graph datasets, two heuristic methods, a related MOEA, and a state-of-the-art Deep Learning approach. The experiments show that MOEIM overall outperforms the competitors in most of the tested many-objective settings. To conclude, we also investigate the correlation between the objectives, leading to novel insights into the topic. The codebase is available at https://github.com/eliacunegatti/MOEIM.


Awareness in robotics: An early perspective from the viewpoint of the EIC Pathfinder Challenge "Awareness Inside''

arXiv.org Artificial Intelligence

Consciousness has been historically a heavily debated topic in engineering, science, and philosophy. On the contrary, awareness had less success in raising the interest of scholars in the past. However, things are changing as more and more researchers are getting interested in answering questions concerning what awareness is and how it can be artificially generated. The landscape is rapidly evolving, with multiple voices and interpretations of the concept being conceived and techniques being developed. The goal of this paper is to summarize and discuss the ones among these voices that are connected with projects funded by the EIC Pathfinder Challenge called "Awareness Inside", a nonrecurring call for proposals within Horizon Europe that was designed specifically for fostering research on natural and synthetic awareness. In this perspective, we dedicate special attention to challenges and promises of applying synthetic awareness in robotics, as the development of mature techniques in this new field is expected to have a special impact on generating more capable and trustworthy embodied systems.


Social Interpretable Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) bears the promise of being an enabling technology for many applications. However, since most of the literature in the field is currently focused on opaque models, the use of RL in high-stakes scenarios, where interpretability is crucial, is still limited. Recently, some approaches to interpretable RL, e.g., based on Decision Trees, have been proposed, but one of the main limitations of these techniques is their training cost. To overcome this limitation, we propose a new population-based method, called Social Interpretable RL (SIRL), inspired by social learning principles, to improve learning efficiency. Our method mimics a social learning process, where each agent in a group learns to solve a given task based both on its own individual experience as well as the experience acquired together with its peers. Our approach is divided into two phases. In the \emph{collaborative phase}, all the agents in the population interact with a shared instance of the environment, where each agent observes the state and independently proposes an action. Then, voting is performed to choose the action that will actually be performed in the environment. In the \emph{individual phase}, each agent refines its individual performance by interacting with its own instance of the environment. This mechanism makes the agents experience a larger number of episodes while simultaneously reducing the computational cost of the process. Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance w.r.t. the alternative interpretable methods from the literature.


Peeking inside Sparse Neural Networks using Multi-Partite Graph Representations

arXiv.org Artificial Intelligence

Modern Deep Neural Networks (DNNs) have achieved very high performance at the expense of computational resources. To decrease the computational burden, several techniques have proposed to extract, from a given DNN, efficient subnetworks which are able to preserve performance while reducing the number of network parameters. The literature provides a broad set of techniques to discover such subnetworks, but few works have studied the peculiar topologies of such pruned architectures. In this paper, we propose a novel \emph{unrolled input-aware} bipartite Graph Encoding (GE) that is able to generate, for each layer in an either sparse or dense neural network, its corresponding graph representation based on its relation with the input data. We also extend it into a multipartite GE, to capture the relation between layers. Then, we leverage on topological properties to study the difference between the existing pruning algorithms and algorithm categories, as well as the relation between topologies and performance.