Watch, Try, Learn: Meta-Learning from Demonstrations and Reward
Zhou, Allan, Jang, Eric, Kappler, Daniel, Herzog, Alex, Khansari, Mohi, Wohlhart, Paul, Bai, Yunfei, Kalakrishnan, Mrinal, Levine, Sergey, Finn, Chelsea
–arXiv.org Artificial Intelligence
Imitation learning allows agents to learn complex behaviors from demonstrations. However, learning a complex vision-based task may require an impractical number of demonstrations. Meta-imitation learning is a promising approach towards enabling agents to learn a new task from one or a few demonstrations by leveraging experience from learning similar tasks. In the presence of task ambiguity or unobserved dynamics, demonstrations alone may not provide enough information; an agent must also try the task to successfully infer a policy. In this work, we propose a method that can learn to learn from both demonstrations and trial-and-error experience with sparse reward feedback. In comparison to meta-imitation, this approach enables the agent to effectively and efficiently improve itself autonomously beyond the demonstration data. In comparison to meta-reinforcement learning, we can scale to substantially broader distributions of tasks, as the demonstration reduces the burden of exploration. Our experiments show that our method significantly outperforms prior approaches on a set of challenging, vision-based control tasks.
arXiv.org Artificial Intelligence
Jun-7-2019