SupplementaryforSOFT: Softmax-freeTransformer withLinearComplexity
–Neural Information Processing Systems
And{φi(x)}arep-orthogonal: Z φi(x)φj(x)p(x)dx=δij. (4) δij is 0 when i 6= j, 1 when i = j. S(m)U(m) =U(m)Λ(m), (7) Li Zhang (lizhangfd@fudan.edu.cn) is the corresponding author with School of Data Science, Fudan University. More specifically, the relation between any two tokens is reconstructed via sampled bottleneck tokens. Further,exact Gaussian kernel attention computation leads totraining difficulties. However, it turns out to suffer from a similar failure.
Neural Information Processing Systems
Feb-10-2026, 18:43:35 GMT
- Technology: