Multi-Agent Generative Adversarial Imitation Learning

Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon

Neural Information Processing Systems 

If the reward function does not cover all important aspects of the task, the agent could easily learn undesirable behaviors [4].