Long-Short Transformer: Efficient Transformers for Language and Vision

Jan-17-2025, 14:18:07 GMT–Neural Information Processing Systems

Transformers have achieved success in both language and vision domains. However, it is prohibitively expensive to scale them to long sequences such as long documents or high-resolution images, because self-attention mechanism has quadratic time and memory complexities with respect to the input sequence length. In this paper, we propose Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. We propose a dual normalization strategy to account for the scale mismatch between the two attention mechanisms.

efficient transformer, long-short transformer, transformer, (7 more...)

Neural Information Processing Systems

Jan-17-2025, 14:18:07 GMT

Conferences Web Page

Add feedback

Country:
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.08)

Technology:
- Information Technology > Artificial Intelligence (0.58)