Imitation Learning from Imperfect Demonstration

Wu, Yueh-Hua, Charoenphakdee, Nontawat, Bao, Han, Tangkaratt, Voot, Sugiyama, Masashi

arXiv.org Machine Learning 

Imitation learning (IL) has become of great interest because obtaining demonstrations is usually easier than designing reward. Reward is a signal to instruct agents to complete the desired tasks. However, ill-designed reward functions usually lead to unexpected behaviors [Amodei et al., 2016; Dewey, 2014; Everitt and Hutter, 2016]. There are two main approaches that can be used to solve IL: behavioral cloning (BC) [Schaal, 1999], which adopts supervised learning approaches to learn an action predictor that is trained directly from demonstration data; and apprenticeship learning (AL), which attempts to find a policy that is better than the expert policy for a class of cost functions [Abbeel and Ng, 2004]. Even though BC can be trained with supervised learning approaches directly, it has been shown that BC cannot imitate the expert policy without a large amount of demonstration data for not considering the transition of environments [Ross et al., 2011].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found