Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off