Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon
–Neural Information Processing Systems
Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs).
Neural Information Processing Systems
Nov-20-2025, 13:27:22 GMT
- Country:
- Asia
- Europe > Italy
- Calabria > Catanzaro Province
- Catanzaro (0.04)
- Tuscany > Florence (0.04)
- Calabria > Catanzaro Province
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Information Technology (0.46)
- Technology: