Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off

Open in new window