SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget