Training AI: Reward is not enough