ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Yefei He1 Weijia Wu2

Neural Information Processing Systems 

KV cache stores key and value states from previous tokens to avoid re-computation, yet it demands substantial storage space, especially for long sequences. Adaptive KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.