Reviews: Meta-Inverse Reinforcement Learning with Probabilistic Context Variables

Neural Information Processing Systems 

The paper identifies the unsolved problem of meta-Inverse Reinforcement Learning. That is, learning a reward function for an unseen task from a single expert trajectory for that task, using a batch of expert trajectories for different but related tasks as training data (the task being solved by each training expert trajectory is not communicated to the learning algorithm). Because IRL is used rather than imitation learning, a reward function is learned for each task (or rather a single reward function parameterized by the latent variable m which is supposed to capture task). The paper then formulates an framework for training neural networks to solve the identified problem, building off of past work on Adversarial IRL, and adding latent task variables to handle the variation in task. A network q_psi is used to identify the task variable from a demonstration.