SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Open in new window