HiFC: High-efficiency Flash-based KV Cache Swapping for Scaling LLM Inference

Open in new window