Goto

Collaborating Authors

 modular meta-learning


Modular Meta-Learning with Shrinkage

Neural Information Processing Systems

Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task-specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules. Empirically, we demonstrate that our method discovers a small set of meaningful task-specific modules and outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. We also show that existing meta-learning methods including MAML, iMAML, and Reptile emerge as special cases of our method.


Reviews: Neural Relational Inference with Fast Modular Meta-learning

Neural Information Processing Systems

This paper is quite unbalanced in two ways. Firstly the balance of space devoted to discussing background vs contributions is skewed too heavily towards discussing prior work, with too little focus on explaining the contributions of this work. Secondly, the coverage of the literature is heavily focused on graph networks and meta learning, but neglects to cover prior work on (non-graph based) modular networks and on learned proposal distributions. Towards the first imbalance, the section on lines 201-235 is by far the most important content in the paper, but is positioned almost as an afterthought to the extensive exposition of Alet et al. (2018). The paper would be much stronger if other sections were shortened and the descriptions in this region were substantially expanded (eg.


Review for NeurIPS paper: Modular Meta-Learning with Shrinkage

Neural Information Processing Systems

Weaknesses: A. Major concerns 1. Can you comment on the choice of a Normal prior for your shrinkage variants, as opposed to a sparsity inducing prior, such as a Laplace or a Spike and Slab prior? Sparsity inducing priors would probably be more intuitive for a better modularity (where some layers would require no adaptation at all, as opposed to a small adaptation). The experiments do show that the sigma version of the different algorithms learn different scales of adaptation. However there is no experiment showing the benefit of these approaches for some of the aspects that motivated this approach (interpretability, causality, transfer learning or domain adaptation), beyond the standard performance in the few-shot learning setting. B. Moderate concerns 1. Lines 27-28: "As data increases, these hard-coded modules may become a bottleneck for further improvement.". In all the experiments of this paper, we are in the few-shot learning setting.


Review for NeurIPS paper: Modular Meta-Learning with Shrinkage

Neural Information Processing Systems

This paper presents a modular meta-learning method where Bayesian shrinkage is used to decide which modules to be updated for a new task. All reviewers agree that the modular approach has a clear motivation, is theoretically sound, and is thoroughly validated by a large set of experiments.


Modular Meta-Learning with Shrinkage

Neural Information Processing Systems

Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task- specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules.


Modular meta-learning in abstract graph networks for combinatorial generalization

Alet, Ferran, Bauza, Maria, Rodriguez, Alberto, Lozano-Perez, Tomas, Kaelbling, Leslie P.

arXiv.org Machine Learning

Modular meta-learning is a new framework that generalizes to unseen datasets by combining a small set of neural modules in different ways. In this work we propose abstract graph networks: using graphs as abstractions of a system's subparts without a fixed assignment of nodes to system subparts, for which we would need supervision. We combine this idea with modular meta-learning to get a flexible framework with combinatorial generalization to new tasks built in. We then use it to model the pushing of arbitrarily shaped objects from little or no training data.


Modular meta-learning

Alet, Ferran, Lozano-Pérez, Tomás, Kaelbling, Leslie P.

arXiv.org Machine Learning

In many situations, such as robot-learning, training experience is very expensive. One strategy for reducing the amount of training data needed for a new task is to learn some form of prior or bias using data from several related tasks. The objective of this process is to extract information that will substantially reduce the training-data requirements for a new task. This problem is a form of transfer learning, sometimes also called meta-learning or "learning to learn" [1, 2]. Previous approaches to meta-learning for robotics have focused on finding distributions over [3] or initial values of [4, 5] parameters, based on a set of "training tasks," that will enable a new "test task" to be learned with many fewer training examples. Our objective is similar, but rather than focusing on transferring information about parameter values, we focus on finding a reusable set of modules that can form components of a solution to a new task, possibly with a small amount of tuning. Modular approaches to learning have been very successful in structured tasks such as naturallanguage sentence interpretation [6], in which the input signal gives relatively direct information about a good structural decomposition of the problem. We wish to address problems that may benefit from a modular decomposition but do not provide any task-level input from which the structure of a solution may be derived. Nonetheless, we adopt a similar modular structure and parameteradaptation method for learning our reusable modules, but use a general-purpose simulated-annealing search strategy to find an appropriate structural decomposition for each new task.