Imitation with Neural Density Models
Kim, Kuno, Jindal, Akshat, Song, Yang, Song, Jiaming, Sui, Yanan, Ermon, Stefano
–arXiv.org Artificial Intelligence
Imitation Learning (IL) algorithms aim to learn optimal behavior by mimicking expert demonstrations. Perhaps the simplest IL method is Behavioral Cloning (BC) [Pomerleau, 1991] which ignores the dynamics of the underlying Markov Decision Process (MDP) that generated the demonstrations, and treats IL as a supervised learning problem of predicting optimal actions given states. Prior work showed that if the learned policy incurs a small BC loss, the worst case performance gap between the expert and imitator grows quadratically with the number of decision steps [Ross and Bagnell, 2010, Ross et al., 2011a]. The crux of their argument is that policies that are "close" as measured by BC loss can induce disastrously different distributions over states when deployed in the environment. One family of solutions to mitigating such compounding errors is Interactive IL [Ross et al., 2011b, 2013, Guo et al., 2014], which involves running the imitator's policy and collecting corrective actions from an interactive expert. However, interactive expert queries can be expensive and are seldom available. Another family of approaches [Ho and Ermon, 2016, Fu et al., 2017, Ke et al., 2020, Kostrikov et al., 2020, Kim and Park, 2018, Wang et al., 2017] that have gained much traction is to directly minimize a statistical distance between state-action distributions induced by policies of the expert and imitator, i.e the occupancy measures ρ
arXiv.org Artificial Intelligence
Oct-19-2020
- Country:
- North America > Puerto Rico
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.46)
- Technology: