BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

May-26-2025, 15:13:41 GMT–Neural Information Processing Systems

This paper concerns the problem of aligning samples from large language models to human preferences using *best-of- n * sampling, where we draw n samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of- n and other (RLHF-type) approaches to aligning LLMs? In particular, when should one be preferred to the other? We show that the best-of- n sampling distribution is essentially equivalent to the policy learned by RLHF if we apply a particular monotone transformation to the reward function.

artificial intelligence, large language model, natural language, (4 more...)

Neural Information Processing Systems

May-26-2025, 15:13:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)