Bayesian Reward Models for LLM Alignment

Open in new window