Luna: LinearUnifiedNestedAttention

Neural Information Processing Systems 

The quadratic computational and memory complexities of the Transformer'sattention mechanism have limited its scalability for modeling long sequences.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found