Bridging the Imitation Gap by Adaptive Insubordination

Neural Information Processing Systems 

In practice, imitation learning is preferred over pure reinforcement learning whenever it is possible to design a teaching agent to provide expert supervision. However, we show that when the teaching agent makes decisions with access to privileged information that is unavailable to the student, this information is marginalized during imitation learning, resulting in an imitation gap and, potentially, poor results.