Reward learning from human preferences and demonstrations in Atari

Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, Dario Amodei

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/