Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

Open in new window