Goto

Collaborating Authors

 Patacchiola, Massimiliano


Transformer Neural Autoregressive Flows

arXiv.org Artificial Intelligence

Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF family. However, they suffer scalability issues and training instability due to the constraints imposed on the network structure. In this paper, we propose a novel solution to these challenges by exploiting transformers to define a new class of neural flows called Transformer Neural Autoregressive Flows (T-NAFs). T-NAFs treat each dimension of a random variable as a separate input token, using attention masking to enforce an autoregressive constraint. We take an amortization-inspired approach where the transformer outputs the parameters of an invertible transformation. The experimental results demonstrate that T-NAFs consistently match or outperform NAFs and B-NAFs across multiple datasets from the UCI benchmark. Remarkably, T-NAFs achieve these results using an order of magnitude fewer parameters than previous approaches, without composing multiple flows.


Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

arXiv.org Artificial Intelligence

In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as it requires access to a distribution over tasks (rollouts from many target policies and variations of the base environment). Given this limitation we investigate an alternative approach, fine-tuning, a family of methods that pretrain on a single dataset and then fine-tune on unseen domain-specific data. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks, especially when the data is out-of-domain. Here we evaluate to what extent this is true for control problems, proposing a simple yet effective baseline which relies on two stages: (i) training a base policy online via reinforcement learning (e.g. Soft Actor-Critic) on a single base environment, (ii) fine-tuning the base policy via behavioral cloning on a few offline rollouts of the target policy. Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment. Importantly, the proposed approach is practical and easy to implement, as it does not need any complex meta-training protocol. As a further contribution, we release an open source dataset called iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym MuJoCo environments with associated pretrained target policies and rollouts, which can be used by the community to study few-shot imitation learning and offline reinforcement learning.


FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification

arXiv.org Machine Learning

Modern deep learning systems are increasingly deployed in situations such as personalization and federated learning where it is necessary to support i) learning on small amounts of data, and ii) communication efficient distributed training protocols. In this work, we develop FiLM Transfer (FiT) which fulfills these requirements in the image classification setting by combining ideas from transfer learning (fixed pretrained backbones and fine-tuned FiLM adapter layers) and meta-learning (automatically configured Naive Bayes classifiers and episodic training) to yield parameter efficient models with superior classification accuracy at low-shot. The resulting parameter efficiency is key for enabling few-shot learning, inexpensive model updates for personalization, and communication efficient federated learning. We experiment with FiT on a wide range of downstream datasets and show that it achieves better classification accuracy than the leading Big Transfer (BiT) algorithm at low-shot and achieves state-of-the art accuracy on the challenging VTAB-1k benchmark, with fewer than 1% of the updateable parameters. Finally, we demonstrate the parameter efficiency and superior accuracy of FiT in distributed low-shot applications including model personalization and federated learning where model update size is an important performance metric.


Memory Efficient Meta-Learning with Large Images

arXiv.org Machine Learning

Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.


Few-Shot Learning with Class Imbalance

arXiv.org Artificial Intelligence

Abstract--Few-Shot Learning (FSL) algorithms are commonly trained through Meta-Learning (ML), which exposes models to batches of tasks sampled from a meta-dataset to mimic tasks seen during evaluation. However, the standard training procedures overlook the real-world dynamics where classes commonly occur at different frequencies. While it is generally understood that class imbalance harms the performance of supervised methods, limited research examines the impact of imbalance on the FSL evaluation task. Our analysis compares 10 state-of-the-art meta-learning and FSL methods on different imbalance distributions and rebalancing techniques. Our results reveal that 1) some FSL methods display a natural disposition against imbalance while most other approaches produce a performance drop by up to 17% compared to the balanced task without the appropriate mitigation; 2) contrary to popular belief, many meta-learning algorithms will not automatically learn to balance from exposure to imbalanced training tasks; 3) classical rebalancing strategies, such as random oversampling, can still be very effective, leading to state-of-the-art performances and should not be overlooked; 4) FSL methods are more robust against meta-dataset imbalance than imbalance at the task-level with a similar imbalance ratio ( ฯ < 20), with the effect holding even in long-tail datasets under a larger imbalance ( ฯ = 65). We identify well to new examples. However, large datasets can be costly and examine three levels of class imbalance: task-level, to obtain and annotate [1]. This is a particularly limiting dataset-level, and combined (task-level and dataset-level) issue in many real-world situations due to the need to perform imbalance. In contrast to previous work on CIFSL [12], [13], real-time operations, the presence of rare categories, [14], [15], we explicitly attribute and quantify the impact on or the desire for a good user experience [2], [3], [4], [5]. the performance caused by class imbalance for each model. Few-Shot Learning (FSL) alleviates this burden by defining Moreover, we study multiple class imbalance distributions, a distribution over tasks, with each task containing a few giving a realistic assessment of performance and revealing labeled data points (support set) and a set of target data previously unknown strengths and weaknesses of 10 stateof-the-art (query set) belonging to the same set of classes. Additionally, we offer practical advice, way to train FSL methods is through Meta-Learning (ML). Figure 1 the model is repeatedly exposed to batches of tasks sampled shows a graphical representation of the CIFSL problem.


Self-Supervised Relational Reasoning for Representation Learning

arXiv.org Machine Learning

In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly manual annotation. In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data. Training a relation head to discriminate how entities relate to themselves (intra-reasoning) and other entities (inter-reasoning), results in rich and descriptive representations in the underlying neural network backbone, which can be used in downstream tasks such as classification and image retrieval. We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones. Self-supervised relational reasoning outperforms the best competitor in all conditions by an average 14% in accuracy, and the most recent state-of-the-art model by 3%. We link the effectiveness of the method to the maximization of a Bernoulli log-likelihood, which can be considered as a proxy for maximizing the mutual information, resulting in a more efficient objective with respect to the commonly used contrastive losses.


Deep Kernel Transfer in Gaussian Processes for Few-shot Learning

arXiv.org Machine Learning

Here, we use the nomenclature derived from the meta-learning literature which is the most prevalent at time of writing. Let S {( x l,y l)} L l 1 be a support-set containing input-output pairs, with L equal to one (1-shot) or five (5-shot), and Q { (x m,y m)} M m 1be a query-set (sometimes referred to in the literature as a target-set), with M typically one order of magnitude greater than L. For ease of notation, the support and query sets are grouped in a task T {S, Q}, with the dataset D {T n} N n 1 defined as a collection of such tasks. Models are trained on random tasks sampled from D . Then, given a new task T {S, Q } sampled from a test set, the objective is to condition the model on the samples of the support S to estimate the membership of the samples in the query set Q . In the most common scenario, the inputs x D belong to the same distribution p(x) and are distributed across training, validation, and test sets such that their class membership is non-overlapping. Note that y can be a continuous value (regression) or a discrete one (classification), even though most of the previous work has focused on classification. We also consider the cross-domain scenario, where the inputs are sampled from different distributions at training and test time; this is more representative of real-world scenarios.


Autonomous Quadrotor Landing using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Landing an unmanned aerial vehicle (UAV) on a ground marker is an open problem despite the effort of the research community. Previous attempts mostly focused on the analysis of hand-crafted geometric features and the use of external sensors in order to allow the vehicle to approach the land-pad. In this article, we propose a method based on deep reinforcement learning that only requires low-resolution images taken from a down-looking camera in order to identify the position of the marker and land the UAV on it. The proposed approach is based on a hierarchy of Deep Q-Networks (DQNs) used as high-level control policy for the navigation toward the marker. We implemented different technical solutions, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Using domain randomization we trained the vehicle on uniform textures and we tested it on a large variety of simulated and real-world environments. The overall performance is comparable with a state-of-the-art algorithm and human pilots.