AITopics | efficient attention

Collaborating Authors

efficient attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing SystemsDec-24-2025, 00:14:08 GMT

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g.

efficient attention, name change, smyrf, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.40)
Information Technology > Artificial Intelligence > Natural Language (0.37)

Add feedback

SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing SystemsMay-26-2025, 22:03:03 GMT

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from O(N 2) to O(N \log N), where N is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g.

machine learning, natural language, smyrf, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.43)
Information Technology > Artificial Intelligence > Natural Language (0.42)

Add feedback

Review for NeurIPS paper: SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing SystemsJan-24-2025, 00:52:40 GMT

This paper proposes a method for reducing the quadratic bottleneck of transformer architectures to O(N log N), using an asymmetric LHS clustering strategy. The paper also shows that finding an optimal assignment is NP-hard and thus, heuristic approaches must be pursued. They propose a novel type of balanced clustering algorithm to approximate attention. The method can be directly used for pre-trained models and achieves competitive/better performance with BigGAN/BERT/RoBERTa by shrinking 50% memory. There was some disagreement among reviewers about this paper, with R1 and R3 recommending solid acceptance, and R2 and R4 recommending weak reject.

asymmetric clustering, efficient attention, neurips paper, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

On Explaining with Attention Matrices

Naim, Omar, Asher, Nicholas

arXiv.org Artificial IntelligenceOct-24-2024

This paper explores the much discussed, possible explanatory link between attention weights (AW) in transformer models and predicted output. Contrary to intuition and early research on attention, more recent prior research has provided formal arguments and empirical evidence that AW are not explanatorily relevant. We show that the formal arguments are incorrect. We introduce and effectively compute efficient attention, which isolates the effective components of attention matrices in tasks and models in which AW play an explanatory role. We show that efficient attention has a causal role (provides minimally necessary and sufficient conditions) for predicting model output in NLP tasks requiring contextual information, and we show, contrary to [7], that efficient attention matrices are probability distributions and are effectively calculable. Thus, they should play an important part in the explanation of attention based model behavior. We offer empirical experiments in support of our method illustrating various properties of efficient attention with various metrics on four datasets.

attention matrix, matrix, prediction, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA240594

2410.18541

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SMYRF - Efficient Attention using Asymmetric Clustering

Neural Information Processing SystemsOct-10-2024, 02:51:26 GMT

asymmetric clustering, efficient attention, smyrf, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.43)
Information Technology > Artificial Intelligence > Natural Language (0.42)

Add feedback

You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism

Hosseini, Mehran, Hosseini, Peyman

arXiv.org Artificial IntelligenceMay-30-2024

Scaled Dot Product Attention (SDPA) is the backbone of many modern deep-learning models. It is so versatile that it has been used in natural language, vision, and multi-modal domains with very little change compared to its original formulation. This paper discusses why the current formulation is inefficient by delving into the mathematical details of the attention mechanism. We propose three improvements to mitigate these inefficiencies, thereby, introducing three enhanced attention mechanisms: Optimised, Efficient, and Super Attention. Optimised and Efficient Attention have one and two matrix multiplications fewer per head, respectively, and 25% and 50% fewer parameters, respectively, than standard SDPA, but perform similarly to standard SDPA in both vision and natural language tasks. They can be used in all applications where SDPA is used while offering smaller model sizes and faster training and inference without noticeable loss in performance. Super Attention introduces a new linear transformation on the values, transforming them from the left. It outperforms standard SPDA on vision and natural language tasks by up to 17% while having one fewer matrix multiplication per head and 25% fewer parameters than standard SDPA. Consequently, it is also faster than standard SDPA. Super Attention is ideal in applications where the attention layer's context length is fixed, such as Vision Transformers. In addition to providing mathematical reasoning, we evaluate the presented attention mechanisms on several datasets including MNIST, CIFAR100, ImageNet, IMDB Movie Reviews, and Amazon Reviews datasets, as well as combined Europarl and Anki English-Spanish datasets for neural machine translation.

attention mechanism, efficient attention, standard attention, (15 more...)

arXiv.org Artificial Intelligence

2403.01643

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.70)

Industry:

Information Technology (0.46)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback

Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition

Tamura, Masato

arXiv.org Artificial IntelligenceApr-15-2024

Social group activity recognition is a challenging task extended from group activity recognition, where social groups must be recognized with their activities and group members. Existing methods tackle this task by leveraging region features of individuals following existing group activity recognition methods. However, the effectiveness of region features is susceptible to person localization and variable semantics of individual actions. To overcome these issues, we propose leveraging attention modules in transformers to generate social group features. In this method, multiple embeddings are used to aggregate features for a social group, each of which is assigned to a group member without duplication. Due to this non-duplicated assignment, the number of embeddings must be significant to avoid missing group members and thus renders attention in transformers ineffective. To find optimal attention designs with a large number of embeddings, we explore several design choices of queries for feature aggregation and self-attention modules in transformer decoders. Extensive experimental results show that the proposed method achieves state-of-the-art performance and verify that the proposed attention designs are highly effective on social group activity recognition.

activity recognition, group activity recognition, recognition, (15 more...)

arXiv.org Artificial Intelligence

2404.09964

Country: North America > United States > California (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Zhang, Jun, Jiang, Shuyang, Feng, Jiangtao, Zheng, Lin, Kong, Lingpeng

arXiv.org Artificial IntelligenceJul-2-2023

Transformer has achieved remarkable success in language, image, and speech processing. Recently, various efficient attention architectures have been proposed to improve transformer's efficiency while largely preserving its efficacy, especially in modeling long sequences. A widely-used benchmark to test these efficient methods' capability on long-range modeling is Long Range Arena (LRA). However, LRA only focuses on the standard bidirectional (or noncausal) self attention, and completely ignores cross attentions and unidirectional (or causal) attentions, which are equally important to downstream applications. In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. CAB collects seven real-world tasks from different research areas to evaluate efficient attentions under the four attention patterns. Among these tasks, CAB validates efficient attentions in eight backbone networks to show their generalization across neural architectures. We conduct exhaustive experiments to benchmark the performances of nine widely-used efficient attention architectures designed with different philosophies on CAB. Extensive experimental results also shed light on the fundamental problems of efficient attentions, such as efficiency length against vanilla attention, performance consistency across attention patterns, the benefit of attention mechanisms, and interpolation/extrapolation on long-context language modeling.

attention pattern, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.07661

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)

Add feedback