SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies
Ghasemipour, Seyed Kamyar Seyed, Gu, Shixiang (Shane), Zemel, Richard
–Neural Information Processing Systems
Imitation Learning (IL) has been successfully applied to complex sequential decision-making problems where standard Reinforcement Learning (RL) algorithms fail. A number of recent methods extend IL to few-shot learning scenarios, where a meta-trained policy learns to quickly master new tasks using limited demonstrations. However, although Inverse Reinforcement Learning (IRL) often outperforms Behavioral Cloning (BC) in terms of imitation quality, most of these approaches build on BC due to its simple optimization objective. In this work, we propose SMILe, a scalable framework for Meta Inverse Reinforcement Learning (Meta-IRL) based on maximum entropy IRL, which can learn high-quality policies from few demonstrations. We examine the efficacy of our method on a variety of high-dimensional simulated continuous control tasks and observe that SMILe significantly outperforms Meta-BC.
Neural Information Processing Systems
Mar-18-2020, 23:47:01 GMT
- Technology: