HACK: Homomorphic Acceleration via Compression of the Key-Value Cache for Disaggregated LLM Inference

Open in new window