Reward learning from human preferences and demonstrations in Atari