Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

Open in new window