Models of human preference for learning reward functions

Open in new window