SupplementaryforSOFT: Softmax-freeTransformer withLinearComplexity

Feb-10-2026, 18:43:35 GMT–Neural Information Processing Systems

And{φi(x)}arep-orthogonal: Z φi(x)φj(x)p(x)dx=δij. (4) δij is 0 when i 6= j, 1 when i = j. S(m)U(m) =U(m)Λ(m), (7) Li Zhang (lizhangfd@fudan.edu.cn) is the corresponding author with School of Data Science, Fudan University. More specifically, the relation between any two tokens is reconstructed via sampled bottleneck tokens. Further,exact Gaussian kernel attention computation leads totraining difficulties. However, it turns out to suffer from a similar failure.

artificial intelligence, machine learning, supplementaryforsoft, (11 more...)

Neural Information Processing Systems

Feb-10-2026, 18:43:35 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

Duplicate Docs Excel Report

Title
Supplementary for SOFT: Softmax-free Transformer with Linear Complexity

Similar Docs Excel Report more

Title	Similarity	Source
None found