KVCacheis1BitPerChannel: EfficientLarge LanguageModelInferencewithCoupledQuantization

Neural Information Processing Systems 

Furthermore, we demonstrate that CQ can preservemodel quality reasonably with KV cache quantizeddownto1bit.