Faster Neighborhood Attention: Reducing the O(n 2) Cost of Self Attention at the Threadblock Level

Open in new window