Faster Neighborhood Attention: Reducing the O (n 2) Cost of Self Attention at the Threadblock Level

Oct-10-2025, 06:30:35 GMT–Neural Information Processing Systems

Neighborhood attention reduces the cost of self attention by restricting each token's attention span to its nearest neighbors. This restriction, parameterized by a window size and dilation factor, draws a spectrum of possible attention patterns between linear projection and self attention. Neighborhood attention, and more generally sliding window attention patterns, have long been bounded by infrastructure, particularly in higher-rank spaces (2-D and 3-D), calling for the development of custom kernels, which have been limited in either functionality, or performance, if not both.

implementation, kernel, neighborhood attention, (17 more...)

Neural Information Processing Systems

Oct-10-2025, 06:30:35 GMT

Conferences PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - Georgia > Fulton County > Atlanta (0.04)

Genre:
- Research Report (0.93)

Industry:
- Education (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.93)
  - Representation & Reasoning (0.88)
  - Vision (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
76e952a4e83d97186d3f55eef6a3a367-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found