Bayesian Reward Models for LLM Alignment