CHAI: Clustered Head Attention for Efficient LLM Inference

Open in new window