ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Y efei He
–Neural Information Processing Systems
KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.
Neural Information Processing Systems
Oct-10-2025, 07:14:25 GMT