ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Yefei He1 Weijia Wu2
–Neural Information Processing Systems
KV cache stores key and value states from previous tokens to avoid re-computation, yet it demands substantial storage space, especially for long sequences. Adaptive KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.
Neural Information Processing Systems
May-30-2025, 10:18:08 GMT