Top-HDecoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Neural Information Processing Systems 

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in openended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-p (nucleus) sampling, and min-p sampling, aim to manage this trade-off.