Batch Active Learning of Reward Functions from Human Preferences

Open in new window