CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation

Open in new window