Luna: Linear Unified Nested Attention Xuezhe Ma
–Neural Information Processing Systems
The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences.
Neural Information Processing Systems
Oct-2-2025, 11:46:20 GMT