KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models

Open in new window