Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Open in new window