AITopics | best-of-n sampling

Collaborating Authors

best-of-n sampling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Neural Information Processing SystemsMay-26-2025, 15:13:41 GMT

This paper concerns the problem of aligning samples from large language models to human preferences using *best-of- n * sampling, where we draw n samples, rank them, and return the best one. We consider two fundamental problems. First: what is the relationship between best-of- n and other (RLHF-type) approaches to aligning LLMs? In particular, when should one be preferred to the other? We show that the best-of- n sampling distribution is essentially equivalent to the policy learned by RLHF if we apply a particular monotone transformation to the reward function.

artificial intelligence, large language model, natural language, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Wang, Yiming, Zhang, Pei, Huang, Siyuan, Yang, Baosong, Zhang, Zhuosheng, Huang, Fei, Wang, Rui

arXiv.org Artificial IntelligenceMar-3-2025

Test-time scaling improves large language model performance by adding extra compute during decoding. Best-of-N (BoN) sampling serves as a common scaling technique, broadening the search space for finding better solutions from the model distribution. However, traditional BoN requires N full generations, leading to high GPU memory overhead and time latency. Moreover, some methods depend on reward models, adding computational cost and limiting domain generalization. In this paper, we propose Self-Truncation Best-of-N (ST-BoN), a novel decoding method that avoids fully generating all samplings and eliminates the need for reward models. ST-BoN introduces early sampling consistency to estimate the most promising sample, truncating suboptimal ones to free memory and accelerate inference. This pushes the sampling-efficient test-time scaling. Compared to traditional BoN, ST-BoN can reduce dynamic GPU memory overhead by over 90% and time latency by 50%, while achieving comparable or even better performance across reasoning and open-ended domains.

arxiv preprint arxiv, best-of-n sampling, prime number, (13 more...)

arXiv.org Artificial Intelligence

2503.01422

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Chow, Yinlam, Tennenholtz, Guy, Gur, Izzeddin, Zhuang, Vincent, Dai, Bo, Thiagarajan, Sridhar, Boutilier, Craig, Agarwal, Rishabh, Kumar, Aviral, Faust, Aleksandra

arXiv.org Artificial IntelligenceDec-18-2024

An effective method for improving the performance of large language models (LLMs) is to leverage additional computation at inference-time: various works (Hosseini et al., 2024; Kumar et al., 2024; Lightman et al., 2023; Wu et al., 2024) have shown that by using search, re-ranking, multi-turn revision, and more generally, any approach that makes use of more tokens and inference-time compute, the performance of LLMs on various tasks can be significantly improved--so much that investing in improving inference-time computation might prove more beneficial than increasing model pre-training compute (Snell et al., 2024). Despite this promise, existing work largely considers using inference-time computation as an optional post-hoc design choice, after conventional pre-training and fine-tuning. However, decoupling training and inference-time computation is not optimal; for example, if we knew that an LLM is allowed to make multiple attempts to solve a math problem, then it may be better to fine-tune it to explore diverse problem-solving strategies, rather than simply generating the candidates that represent the model's best attempt at solving the problem. Within the context of reasoning problems, these performance gains may be significant, as LLMs often fail due to their inability to draw complex inferences about the input and their internal knowledge (Chen et al., 2024). We argue that the effectiveness of inference-time computation can be substantially increased by explicitly considering the inference procedure during training. We study this inference-aware fine-tuning paradigm using the Best-of-N (BoN) inference strategy, where the LLM generates multiple candidate responses, and a verifier selects the best one according to some scoring function (Cobbe et al., 2021).

inference-aware fine-tuning, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.15287

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Qiu, Jiahao, Lu, Yifu, Zeng, Yifan, Guo, Jiacheng, Geng, Jiayi, Wang, Huazheng, Huang, Kaixuan, Wu, Yue, Wang, Mengdi

arXiv.org Artificial IntelligenceOct-29-2024

Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, HH-RLHF, UltraFeedback, GSM8K, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves the highest win rate of 65% on TutorEval and around 60% win rates across other different datasets, outperforming standard BoN with the same computational cost and showcasing its scalability and alignment efficacy.

enhancing inference-time alignment, reward model, treebon, (10 more...)

arXiv.org Artificial Intelligence

2410.16033

Country:

North America > United States > Oregon (0.04)
North America > United States > Michigan (0.04)
Europe > Spain > Aragón (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback