AITopics

Technology: Information Technology > Artificial Intelligence (0.84)

Neural Information Processing SystemsFeb-18-2026, 07:44:34 GMT

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass Ethan Shen Alan Fan Sarah Pratt Jae Sung Park Matthew Wallingford Sham Kakade Ari Holtzman Ranjay Krishna

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions.

large language model, machine learning, superposed decoding, (18 more...)

Country:

Oceania > Australia > New South Wales (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India > NCT > New Delhi (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.94)
Health & Medicine > Therapeutic Area (0.69)
Information Technology > Services (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsOct-10-2025, 18:04:29 GMT

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass Ethan Shen Alan Fan Sarah Pratt Jae Sung Park Matthew Wallingford Sham Kakade Ari Holtzman Ranjay Krishna

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions.

decoding, nucleus sampling, superposed decoding, (14 more...)

Country:

Oceania > Australia > New South Wales (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India > NCT > New Delhi (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.94)
Health & Medicine > Therapeutic Area (0.69)
Information Technology > Services (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Neural Information Processing SystemsMay-27-2025, 18:41:54 GMT

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

multiple generation, single autoregressive inference pass, superposed decoding, (2 more...)

Technology: Information Technology > Artificial Intelligence (0.88)

Borec, Luka, Sadler, Philipp, Schlangen, David

The Unreasonable Ineffectiveness of Nucleus Sampling on Mitigating Text Memorization

arXiv.org Artificial IntelligenceAug-29-2024

This work analyses the text memorization behavior of large language models (LLMs) when subjected to nucleus sampling. Stochastic decoding methods like nucleus sampling are typically applied to overcome issues such as monotonous and repetitive text generation, which are often observed with maximization-based decoding techniques. We hypothesize that nucleus sampling might also reduce the occurrence of memorization patterns, because it could lead to the selection of tokens outside the memorized sequence. To test this hypothesis we create a diagnostic dataset with a known distribution of duplicates that gives us some control over the likelihood of memorization of certain parts of the training data. Our analysis of two GPT-Neo models fine-tuned on this dataset interestingly shows that (i) an increase of the nucleus size reduces memorization only modestly, and (ii) even when models do not engage in "hard" memorization -- a verbatim reproduction of training samples -- they may still display "soft" memorization whereby they generate outputs that echo the training data but without a complete one-by-one resemblance.

mitigating text memorization, nucleus sampling, unreasonable ineffectiveness

2408.16345

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Manevich, Avshalom, Tsarfaty, Reut

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

arXiv.org Artificial IntelligenceAug-6-2024

Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities. However, LVLMs struggle with object hallucinations due to their reliance on text cues and learned object co-occurrence biases. While most research quantifies these hallucinations, mitigation strategies are still lacking. Our study introduces a Language Contrastive Decoding (LCD) algorithm that adjusts LVLM outputs based on LLM distribution confidence levels, effectively reducing object hallucinations. We demonstrate the advantages of LCD in leading LVLMs, showing up to %4 improvement in POPE F1 scores and up to %36 reduction in CHAIR scores on the COCO validation set, while also improving captioning quality scores. Our method effectively improves LVLMs without needing complex post-processing or retraining, and is easily applicable to different models. Our findings highlight the potential of further exploration of LVLM-specific decoding algorithms.

hallucination, lvlm, wang, (15 more...)

2408.04664

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Michigan (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Furniturewala, Shaz, Jaidka, Kokil, Sharma, Yashvardhan

Impact of Decoding Methods on Human Alignment of Conversational LLMs

arXiv.org Artificial IntelligenceJul-28-2024

To be included into chatbot systems, Large language models (LLMs) must be aligned with human conversational conventions. However, being trained mainly on web-scraped data gives existing LLMs a voice closer to informational text than actual human speech. In this paper, we examine the effect of decoding methods on the alignment between LLM-generated and human conversations, including Beam Search, Top K Sampling, and Nucleus Sampling. We present new measures of alignment in substance, style, and psychometric orientation, and experiment with two conversation datasets. Our results provide subtle insights: better alignment is attributed to fewer beams in Beam Search and lower values of P in Nucleus Sampling. We also find that task-oriented and open-ended datasets perform differently in terms of alignment, indicating the significance of taking into account the context of the interaction.

alignment, dataset, sampling, (16 more...)

2407.19526

Country:

Asia > Singapore (0.05)
Asia > China > Beijing > Beijing (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceJul-2-2024

Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling

Li, Margaret, Shi, Weijia, Pagnoni, Artidoro, West, Peter, Holtzman, Ari

RLHF-aligned LMs have shown unprecedented ability on both benchmarks and long-form text generation, yet they struggle with one foundational task: next-token prediction. As RLHF models become agent models aimed at interacting with humans, they seem to lose their world modeling -- the ability to predict what comes next in arbitrary documents, which is the foundational training objective of the Base LMs that RLHF adapts. Besides empirically demonstrating this trade-off, we propose a potential explanation: to perform coherent long-form generation, RLHF models restrict randomness via implicit blueprints. In particular, RLHF models concentrate probability on sets of anchor spans that co-occur across multiple generations for the same prompt, serving as textual scaffolding but also limiting a model's ability to generate documents that do not include these spans. We study this trade-off on the most effective current agent models, those aligned with RLHF, while exploring why this may remain a fundamental trade-off between models that act and those that predict, even as alignment techniques improve.

base model, probability, rlhf model, (13 more...)

2407.02446

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-24-2024

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Shen, Ethan, Fan, Alan, Pratt, Sarah M., Park, Jae Sung, Wallingford, Matthew, Kakade, Sham M., Holtzman, Ari, Krishna, Ranjay, Farhadi, Ali, Kusupati, Aditya

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding.

decoding, nucleus sampling, superposed decoding, (14 more...)

2405.184

Country:

Oceania > Australia > New South Wales (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India > NCT > New Delhi (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.94)
Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-7-2024

Constrained Decoding for Secure Code Generation

Fu, Yanjun, Baker, Ethan, Ding, Yu, Chen, Yizheng

Code Large Language Models (Code LLMs) have been increasingly used by developers to boost productivity, but they often generate vulnerable code. Thus, there is an urgent need to ensure that code generated by Code LLMs is correct and secure. Previous research has primarily focused on generating secure code, overlooking the fact that secure code also needs to be correct. This oversight can lead to a false sense of security. Currently, the community lacks a method to measure actual progress in this area, and we need solutions that address both security and correctness of code generation. This paper introduces a new benchmark, CodeGuard+, along with two new metrics, to measure Code LLMs' ability to generate both secure and correct code. Using our new evaluation methods, we show that the state-of-the-art defense technique, prefix tuning, may not be as strong as previously believed, since it generates secure code but sacrifices functional correctness. We also demonstrate that different decoding methods significantly affect the security of Code LLMs. Furthermore, we explore a new defense direction: constrained decoding for secure code generation. We propose new constrained decoding techniques to generate secure code. Our results reveal that constrained decoding is more effective than prefix tuning to improve the security of Code LLMs, without requiring a specialized training dataset. Moreover, our evaluations over eight state-of-the-art Code LLMs show that constrained decoding has strong performance to improve the security of Code LLMs, and our technique outperforms GPT-4.

code llm, constraint, sampling, (16 more...)

2405.00218

Country: North America > United States > Maryland (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)