Metric Transforms and Low Rank Representations of Kernels for Fast Attention

Neural Information Processing Systems 

This suggests that the low-rank fast attention only works for functions approx-imable by polynomials.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found