Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning

Arnold, Sébastien M. R., Iqbal, Shariq, Sha, Fei

Oct-29-2019–arXiv.org Machine Learning

Meta-learning methods, most notably Model-Agnostic Meta-Learning (Finn et al., 2017) or MAML, have achieved great success in adapting to new tasks quickly, after having been trained on similar tasks. The mechanism behind their success, however, is poorly understood. We begin this work with an experimental analysis of MAML, finding that deep models are crucial for its success, even given sets of simple tasks where a linear model would suffice on any individual task. Furthermore, on image-recognition tasks, we find that the early layers of MAML-trained models learn task-invariant features, while later layers are used for adaptation, providing further evidence that these models require greater capacity than is strictly necessary for their individual tasks. Following our findings, we propose a method which enables better use of model capacity at inference time by separating the adaptation aspect of meta-learning into parameters that are only used for adaptation but are not part of the forward model. We find that our approach enables more effective meta-learning in smaller models, which are suitably sized for the individual tasks. Meta-learning or learning to learn is an appealing notion due to its potential in addressing important challenges when applying machine learning to real-world problems. In particular, learning from prior tasks but being able to to adapt quickly to new tasks improves learning efficiency, model robustness, etc. A promising set of techiques, Model-Agnostic Meta-Learning (Finn et al., 2017) or MAML, and its variants, have received a lot of interest (Nichol et al., 2018; Lee & Choi, 2018; Grant et al., 2018). However, despite several efforts, understanding of how MAML works, either theoretically or in practice, has been lacking (Finn & Levine, 2018; Fallah et al., 2019). For a model that meta-learns, its parameters need to encode not only the common knowledge extracted from the tasks it has seen, which form a task-general inductive bias, but also the capability to adapt to new test tasks (similar to those it has seen) with task-specific knowledge.

adaptation, international conference, optimizer, (15 more...)

arXiv.org Machine Learning

Oct-29-2019

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - Massachusetts > Plymouth County
    - Norwell (0.04)
  - California > Los Angeles County
    - Los Angeles (0.28)
    - Long Beach (0.04)
- Europe > Sweden
  - Stockholm > Stockholm (0.04)

Genre:
- Research Report > New Finding (0.90)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found