SnapKV: LLM Knows What You Are Looking for before Generation Bowen Yang

May-28-2025, 20:58:41 GMT–Neural Information Processing Systems

Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach that efficiently minimizes KV cache size while still delivering comparable accuracy in real-world applications. We discover that each attention head in the model consistently focuses on specific prompt attention features during generation. Meanwhile, this robust pattern can be obtained from an'observation' window located at the end of the prompts.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-28-2025, 20:58:41 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Illinois (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language > Large Language Model (1.00)