Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding

Jun-14-2026, 07:02:29 GMT–Neural Information Processing Systems

Test-time scaling enhances large language model performance by allocating additional compute resources during decoding. Best-of-$N$ (BoN) sampling serves as a common sampling-based scaling technique, broadening the search space in parallel to find better solutions from the model distribution. However, its cost-performance trade-off is still underexplored. Two main challenges limit the efficiency of BoN sampling: (1) Generating $N$ full samples consumes substantial GPU memory, reducing inference capacity under limited resources.

artificial intelligence, natural language, proceedings, (6 more...)

Neural Information Processing Systems

Jun-14-2026, 07:02:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.59)