Robust Imitation of a Few Demonstrations with a Backwards Model

Neural Information Processing Systems 

Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way over reinforcement learning.