NSNQuant: ADouble Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Open in new window