Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference

Open in new window