Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

May-27-2025, 09:14:02 GMT–Neural Information Processing Systems

Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, we introduce Nearest Neighbor Speculative Decoding (NEST), a novel semi-parametric language modeling approach that is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources. NEST performs token-level retrieval at each inference step to compute a semi-parametric mixture distribution and identify promising span continuations in a corpus.

chen, llm generation and attribution, nearest neighbor speculative decoding

Neural Information Processing Systems

May-27-2025, 09:14:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning > Case-Based Reasoning (0.88)