AQUA: Attention via QUery mAgnitudes for Memory and Compute Efficient Inference in LLMs

Open in new window