GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Open in new window