A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention

Open in new window