cf78a15772ec1a6aee9bbee2d2b382c3-Supplemental-Conference.pdf

Feb-12-2026, 00:47:28 GMT–Neural Information Processing Systems

Our first step is to prove the parameterization (Eq. 3) provides local attention after the Note that the weight and bias terms in theaboveformulation (Eq. Assume the position-based function at each head is learned to perform'hard attention' on one of its surrounding positions,i.e., an extreme semi-dynamic attention. To demonstrate this phenomenon, we plot and compare the impacts ofΦc and Φp6 on Φa in the middle and right of Fig. S4 and visualize learned position-based attentionΦp of iRPE in Fig. S5. As seen from Tab. S17, there exist noticeable performance gaps between the models (b, f, g, h) (withoutΦp)and(a,d,e,i)(withΦp). Without adaptiveattention (model (c)),Φp imposes stronger locality onevery layer.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Feb-12-2026, 00:47:28 GMT

Conferences PDF

Add feedback

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.94)

Duplicate Docs Excel Report

Title
Peripheral Vision Transformers - Supplementary Materials - Juhong Min

Similar Docs Excel Report more

Title	Similarity	Source
None found