More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)
Meir, Sagi, Keidar, Tommer D., Levi, Noam, Reuveni, Shlomi, Hirshberg, Barak
The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for any given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs using HumanEval demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws.
Jan-30-2026
- Country:
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.05)
- Europe
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Switzerland > Vaud
- North America > United States
- Nevada > Clark County > Las Vegas (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.65)
- Technology: