MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

Open in new window