WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference

Open in new window