State Alignment-based Imitation Learning
Liu, Fangchen, Ling, Zhan, Mu, Tongzhou, Su, Hao
A BSTRACT Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models. 1 I NTRODUCTION Learning from demonstrations (imitation learning, abbr. Imitation learning methods can be generally divided into two categories: behavior cloning (BC) and inverse reinforcement learning (IRL). Behavior cloning (Ross et al., 2011b) formulates a supervised learning problem to learn a policy that maps states to actions using demonstration trajectories. Inverse reinforcement learning (Russell, 1998; Ng et al., 2000) tries to find a proper reward function that can induce the given demonstration trajectories. GAIL (Ho & Ermon, 2016) and its variants (Fu et al.; Qureshi et al., 2018; Xiao et al., 2019) are the recently proposed IRL-based methods, which uses a GAN-based reward to align the distribution of state-action pairs between the expert and the imitator.
Nov-21-2019
- Country:
- North America > United States
- Asia > Japan
- Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Genre:
- Research Report (0.50)
- Industry:
- Education > Focused Education > Special Education (0.45)
- Technology: