ES-MAML: Simple Hessian-Free Meta Learning

Song, Xingyou, Gao, Wenbo, Yang, Yuxiang, Choromanski, Krzysztof, Pacchiano, Aldo, Tang, Yunhao

Oct-5-2019–arXiv.org Artificial Intelligence

Meta-learning is a paradigm in machine learning which aims to develop models and training algorithms which can quickly adapt to new tasks and data. Our focus in this paper is on meta-learning in reinforcement learning (RL), where data efficiency is of paramount importance because gathering new samples often requires costly simulations or interactions with the real world. A popular technique for RL meta-learning is Model Agnostic Meta Learning (MAML) (Finn et al., 2017, 2018), a model for training an agent (the meta-policy) which can quickly adapt to new and unknown tasks by performing one (or a few) gradient updates in the new environment. We provide a formal description of MAML in Section 2. MAML has proven to be successful for many applications. However, implementing and running MAML continues to be challenging. One major complication is that the standard version of MAML requires estimating second derivatives of the RL reward function, which is difficult when using backpropagation on stochastic policies; indeed, the original implementation of MAML (Finn et al., 2017) did so incorrectly, which spurred the development of unbiased higher-order estimators (DiCE, (Foerster et al., 2018)) and further analysis of the credit assignment mechanism in MAML (Rothfuss et al., 2019).

algorithm, neural network, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

Oct-5-2019

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America
  - Canada (0.28)
  - United States > California (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Evolutionary Systems (0.93)
  - Neural Networks (0.67)
  - Statistical Learning > Gradient Descent (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found