Reducing Transformer Key-Value Cache Size with Cross-Layer Attention William Brandon

Neural Information Processing Systems 

Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found