SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies

Ghasemipour, Seyed Kamyar Seyed, Gu, Shixiang (Shane), Zemel, Richard

Mar-18-2020, 23:47:01 GMT–Neural Information Processing Systems

Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC.

inverse reinforcement learning, meta inverse reinforcement learning, reinforcement learning, (4 more...)

Neural Information Processing Systems

Mar-18-2020, 23:47:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)