Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation

Open in new window