OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference

Open in new window