Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM

Open in new window