Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

Open in new window