BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
–Neural Information Processing Systems
This paper concerns the problem of aligning samples from large language models to human preferences using *best-of- n * sampling, where we draw n samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of- n and other (RLHF-type) approaches to aligning LLMs? In particular, when should one be preferred to the other? We show that the best-of- n sampling distribution is essentially equivalent to the policy learned by RLHF if we apply a particular monotone transformation to the reward function.
Neural Information Processing Systems
May-26-2025, 15:13:41 GMT
- Technology: