Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Open in new window