ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Y efei He

Neural Information Processing Systems 

KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.