MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

Open in new window