Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Kirsch, Louis, van Steenkiste, Sjoerd, Schmidhuber, Jürgen

arXiv.org Artificial Intelligence 

A BSTRACT Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta-reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that affects how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human-engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency. 1 I NTRODUCTION The process of evolution has equipped humans with incredibly general learning algorithms. They allow us to flexibly solve a wide range of problems, even in the absence of many related prior experiences. The inductive biases that give rise to these capabilities are the result of distilling the collective experiences of many learners throughout the course of natural evolution. By essentially learning from learning experiences in this way, this knowledge can be compactly encoded in the genetic code of an individual to give rise to the general learning capabilities that we observe today. In contrast, Reinforcement Learning (RL) in artificial agents rarely proceeds in this way. The learning rules that are used to train agents are the result of years of human engineering and design, (e.g. Correspondingly, artificial agents are inherently limited by the ability of the designer to incorporate the right inductive biases in order to learn from previous experiences.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found