Goto

Collaborating Authors

 supplementaryforsoft


SupplementaryforSOFT: Softmax-freeTransformer withLinearComplexity

Neural Information Processing Systems

And{φi(x)}arep-orthogonal: Z φi(x)φj(x)p(x)dx=δij. (4) δij is 0 when i 6= j, 1 when i = j. S(m)U(m) =U(m)Λ(m), (7) Li Zhang (lizhangfd@fudan.edu.cn) is the corresponding author with School of Data Science, Fudan University. More specifically, the relation between any two tokens is reconstructed via sampled bottleneck tokens. Further,exact Gaussian kernel attention computation leads totraining difficulties. However, it turns out to suffer from a similar failure.