KVCacheis1BitPerChannel: EfficientLarge LanguageModelInferencewithCoupledQuantization

Feb-7-2026, 10:04:28 GMT–Neural Information Processing Systems

Furthermore, we demonstrate that CQ can preservemodel quality reasonably with KV cache quantizeddownto1bit.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Feb-7-2026, 10:04:28 GMT

Conferences PDF

Country:
- North America > United States
  - Texas > Harris County
    - Houston (0.04)
  - New Jersey > Hudson County
    - Hoboken (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.30)

Duplicate Docs Excel Report

Title
05d6b5b6901fb57d2c287e1d3ce6d63c-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found