On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator
Cai, Qi, Hong, Mingyi, Chen, Yongxin, Wang, Zhaoran
Imitation learning is a paradigm that learns from expert demonstration to perform a task. The most straightforward approach of imitation learning is behavioral cloning (Pomerleau, 1991), which learns from expert trajectories to predict the expert action at any state. Despite its simplicity, behavioral cloning ignores the accumulation of prediction error over time. Consequently, although the learned policy closely resembles the expert policy at a given point in time, their trajectories may diverge in the long term. To remedy the issue of error accumulation, inverse reinforcement learning(Russell, 1998; Ng and Russell, 2000; Abbeel and Ng, 2004; Ratliff et al., 2006; Ziebart et al., 2008; Ho and Ermon, 2016) jointly learns a reward function and the corresponding optimal policy, such that the expected cumulative
Jan-11-2019
- Country:
- North America > United States
- Minnesota (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Technology: