Accurate KVCache Eviction via Anchor Direction Projection for Efficient LLMInference

Jun-14-2026, 18:46:50 GMT–Neural Information Processing Systems

Key-Value (KV) cache eviction--which retains the KV pairs of the most important tokens while discarding less important ones--is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics--such as attention weights--to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely AnDPro (Anchor Direction Projection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an "Anchor Direction"--the direction of the pre-eviction output--to measure token importance and guide more accurate token selection. Experiments on 16datasets from the LongBench benchmark demonstrate that AnDPro can maintain 96.07%of the full cache accuracy using only 3.44%KV cache budget, reducing KV cache budget size by 46.0% without compromising quality compared to previous state-of-the-arts.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Jun-14-2026, 18:46:50 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found