Reviews: State Aware Imitation Learning

Neural Information Processing Systems 

This paper proposes a framework for MAP-estimation based imitation learning, which can be seen as adding to the standard supervised learning loss a cost of deviating from the observed state distribution. The key idea for optimizing such a loss is a gradient expression for the stationary distribution of a given policy that can be estimated online using policy rollouts. This idea is an extension of a previous result by Morimura (2010). I find the approach original, and potentially interesting to the NIPS community. The consequences of deviating from the demonstrated states in imitation learning have been recognized earlier, but this paper proposes a novel approach to this problem.