ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Y efei He

Oct-10-2025, 07:14:25 GMT–Neural Information Processing Systems

KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.

attention score, compression ratio, quantization, (14 more...)

Neural Information Processing Systems

Oct-10-2025, 07:14:25 GMT

Conferences PDF

Country:
- Oceania > Australia (0.04)
- Asia
  - China (0.04)
  - Singapore > Central Region
    - Singapore (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
7e57131fdeb815764434b65162c88895-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found