Foundations of Top-k Decoding for Language Models

Jun-21-2026, 12:09:10 GMT–Neural Information Processing Systems

Top-kdecoding is a widely used method for sampling from LLMs: at each token, only the largest k next-token-probabilities are kept, and the next token is sampled after renormalizing them to sum to unity. Top-kand other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-k decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-k decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We introduce Bregman decoders obtained by minimizing a separable Bregman divergence (for both the primal and dual cases) with a sparsity-inducing ℓ0-regularization; in particular, these decoders are adaptive in the sense that the sparsity parameter k is chosen depending on the underlying token distribution. Despite the combinatorial nature of the sparse Bregman objective, we show how to optimize it efficiently for a large class of divergences. We prove that (i) the optimal decoding strategies are greedy, and further that (ii) the objective is discretely convex in k, such that the optimal k can be identified in logarithmic time. We note that standard top-k decoding arises as a special case for the KL divergence, and construct new decoding strategies with substantially different behaviors (e.g., non-linearly up-weighting larger probabilities after renormalization).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-21-2026, 12:09:10 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.45)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Optimization (0.92)
  - Machine Learning
    - Statistical Learning (0.92)
    - Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found