Primal Wasserstein Imitation Learning

Dadashi, Robert, Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier

arXiv.org Machine Learning 

Reinforcement Learning (RL) has solved a number of difficult tasks whether in games [Tesauro, 1995, Mnih et al., 2015, Silver et al., 2016] or robotics [Abbeel and Ng, 2004, Andrychowicz et al., 2020]. However, RL relies on the existence of a reward function, that can be either hard to specify or too sparse to be used in practice. Imitation Learning (IL) is a paradigm that applies to these environments with hard to specify rewards: we seek to solve a task by learning a policy from a fixed number of demonstrations generated by an expert. IL methods can typically be folded into two paradigms: Behavioral Cloning [Pomerleau, 1991, Bagnell et al., 2007, Ross and Bagnell, 2010] and Inverse Reinforcement Learning [Russell, 1998, Ng et al., 2000]. In Behavioral Cloning, we seek to recover the expert's behavior by directly learning a policy that matches the expert behavior in some sense. In Inverse Reinforcement Learning (IRL), we assume that the demonstrations come from an agent that acts optimally with respect to an unknown reward function that we seek to recover, to subsequently train an agent on it. Although IRL methods introduce an intermediary problem to solve (i.e.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found