Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference