Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Open in new window