Supplementary for SOFT: Softmax-free Transformer with Linear Complexity

Aug-16-2025, 22:40:02 GMT–Neural Information Processing Systems

According to the eigenfunction's definition, we can get: null k (y,x)φ Li Zhang (lizhangfd@fudan.edu.cn) is the corresponding author with School of Data Science, Fudan In our formulation, instead of directly calculating the Gaussian kernel weights, they are approximated. More specifically, the relation between any two tokens is reconstructed via sampled bottleneck tokens. However, it turns out to suffer from a similar failure. For each model, we show the output from the first two attention heads (up and down row). Attention is all you need.

approximation, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Aug-16-2025, 22:40:02 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East > Israel (0.05)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Duplicate Docs Excel Report

Title
SupplementaryforSOFT: Softmax-freeTransformer withLinearComplexity

Similar Docs Excel Report more

Title	Similarity	Source
None found