Fast Best-of-N Decoding via Speculative Rejection Hanshi Sun

Oct-11-2025, 00:17:44 GMT–Neural Information Processing Systems

The safe and effective deployment of Large Language Models (LLMs) involves a critical step called alignment, which ensures that the model's responses are in accordance with human preferences. Prevalent alignment techniques, such as DPO, PPO and their variants, align LLMs by changing the pre-trained model weights during a phase called post-training. While predominant, these post-training methods add substantial complexity before LLMs can be deployed. Inference-time alignment methods avoid the complex post-training step and instead bias the generation towards responses that are aligned with human preferences. The best-known inference-time alignment method, called Best-of-N, is as effective as the state-of-the-art post-training procedures. Unfortunately, Best-of-N requires vastly more resources at inference time than standard decoding strategies, which makes it computationally not viable.

arxiv preprint arxiv, best-of-n, language model, (14 more...)

Neural Information Processing Systems

Oct-11-2025, 00:17:44 GMT

Conferences PDF

Add feedback

Country:
- Africa > Niger (0.04)
- North America > United States
  - Virginia (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - California
    - Santa Clara County > Palo Alto (0.04)
    - Alameda County > Berkeley (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Fast Best-of-N Decoding via Speculative Rejection Hanshi Sun

Similar Docs Excel Report more

Title	Similarity	Source
None found