Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards