SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention

Open in new window