Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Huang, Audrey, Block, Adam, Liu, Qinghua, Jiang, Nan, Krishnamurthy, Akshay, Foster, Dylan J.

Apr-7-2025–arXiv.org Machine Learning

Inference-time computation offers a powerful axis for scaling the performance of language models. However, naively increasing computation in techniques like Best-of-N sampling can lead to performance degradation due to reward hacking. Toward a theoretical understanding of how to best leverage additional computation, we focus on inference-time alignment, which we formalize as the problem of improving the quality of responses drawn from a pre-trained policy, given a prompt of interest and access to an imperfect reward model. We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute, and provide new results that highlight the importance of the pre-trained policy's coverage over high-quality responses for performance and compute scaling: 1. We show that Best-of-$N$ alignment with an ideal choice for $N$ can achieve optimal performance under stringent notions of coverage, but provably suffers from reward hacking when $N$ is large, and fails to achieve tight guarantees under more realistic coverage conditions. 2. We introduce $\texttt{InferenceTimePessimism}$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute, implementing the principle of pessimism in the face of uncertainty via rejection sampling; we prove that its performance is optimal and does not degrade with $N$, meaning it is scaling-monotonic. We complement our theoretical results with an experimental evaluation that demonstrate the benefits of $\texttt{InferenceTimePessimism}$ across a variety of tasks and models.

inferencetimepessimism, large language model, machine learning, (21 more...)

arXiv.org Machine Learning

Apr-7-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois (0.04)
- Europe > Slovenia
  - Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.67)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found