Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference

Open in new window