Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Arenz, Oleg, Neumann, Gerhard

arXiv.org Machine Learning 

Imitation learning (IL, Schaal, 1999; Osa et al., 2018) and inverse reinforcement learning (IRL, Ng and Russell, 2000) are two related areas of research that aim to teach agents by providing demonstrations of the desired behavior. Whereas imitation learning aims to learn a policy that results in a similar behavior, inverse reinforcement learning focuses on inferring a reward function that might have been optimized by the demonstrator, aiming to better generalize to different environments. Both areas of research are often formalized as distribution-matching, that is, the learned policy (or the optimal policy for IRL) should induce a distribution over states and actions that is close to the expert's distribution with respect to a given (usually non-metric) distance. Commonly applied distances are the forward Kullback-Leibler (KL) divergence (e.g., Ziebart, 2010), which maximizes the likelihood of the demonstrated state-action pairs under the agent's distribution, and the reverse Kullback-Leibler (RKL) divergence (e.g., Arenz et al., 2016; Fu et al., 2018; Ghasemipour et al., 2020) which minimizes the expected discrimination information (Kullback and Leibler, 1951) of state-action pairs sampled from the agent's distribution. However, since the emergence of generative adversarial networks (GANs, Goodfellow et al., 2014) as a solution technique for both areas, other divergences have been investigated such as the Jensen-Shannon divergence (Ho and Ermon, 2016), the Wasserstein distance (Xiao et al., 2019) and general f-divergences (Ke et al., 2019; Ghasemipour et al., 2020).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found