Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon

Feb-17-2026, 00:14:55 GMT–Neural Information Processing Systems

Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs).

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Feb-17-2026, 00:14:55 GMT

Conferences PDF

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Italy
  - Tuscany > Florence (0.04)
  - Calabria > Catanzaro Province
    - Catanzaro (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon

Similar Docs Excel Report more

Title	Similarity	Source
None found