ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Y efei He

Neural Information Processing Systems 

KV cache compression seeks to discern the saliency of tokens, preserving vital information while aggressively compressing those of less importance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found