SparQ Attention: Bandwidth-Efficient LLM Inference

Open in new window