Luna: Linear Unified Nested Attention Xuezhe Ma

Neural Information Processing Systems 

The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found