Imitation Learning from Imperfect Demonstration
Wu, Yueh-Hua, Charoenphakdee, Nontawat, Bao, Han, Tangkaratt, Voot, Sugiyama, Masashi
Imitation learning (IL) has become of great interest because obtaining demonstrations is usually easier than designing reward. Reward is a signal to instruct agents to complete the desired tasks. However, ill-designed reward functions usually lead to unexpected behaviors [Amodei et al., 2016; Dewey, 2014; Everitt and Hutter, 2016]. There are two main approaches that can be used to solve IL: behavioral cloning (BC) [Schaal, 1999], which adopts supervised learning approaches to learn an action predictor that is trained directly from demonstration data; and apprenticeship learning (AL), which attempts to find a policy that is better than the expert policy for a class of cost functions [Abbeel and Ng, 2004]. Even though BC can be trained with supervised learning approaches directly, it has been shown that BC cannot imitate the expert policy without a large amount of demonstration data for not considering the transition of environments [Ross et al., 2011].
Jan-29-2019
- Country:
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Sports > Basketball (0.46)
- Technology: