AITopics | attention value

Collaborating Authors

attention value

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attribution-Driven Adaptive Token Pruning for Transformers

Neural Information Processing SystemsJun-12-2026, 01:58:10 GMT

Transformers have been widely adopted in natural language processing, computer vision, and other domains due to their exceptional performance across a variety of tasks. However, the computational cost of Transformers is prohibitively high, particularly when handling long input sequences, significantly increasing both training and inference time. Although various token pruning methods have been proposed to reduce the computational burden of Transformers, most approaches overlook critical differences in sequences in terms of length and complexity, leading to suboptimal compression efficiency. In this paper, we propose AD-TP, an Attribution-Driven Adaptive Token Pruning method designed to retain only the most informative tokens. We analyze the performance of using accumulated attention values to measure token importance and find that attention values do not accurately reflect the actual contribution of each token to text understanding.

artificial intelligence, natural language, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

aee2f03ecb2b2c1ea55a43946b651cfd-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 10:25:02 GMT

artificial intelligence, energy 0, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Energy (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

8db9279f593652ee9bb2223b4a2c43fa-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 13:37:51 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America (0.14)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Europe > Spain > Andalusia (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)
Instructional Material (0.86)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Nuclear Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

8db9279f593652ee9bb2223b4a2c43fa-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:09:34 GMT

camel yon16, dataset, interaction, (16 more...)

Neural Information Processing Systems

Country:

North America (0.14)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Europe > Spain > Andalusia (0.04)
Europe > Belgium > Flanders (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Nuclear Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(3 more...)

Add feedback

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Neural Information Processing SystemsAug-16-2025, 04:43:04 GMT

In this paper, our goal is to design an enhanced RL agent with a reasoning process for text-based games.

agent, reasoning, text-based game, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

ToFe: Lagged Token Freezing and Reusing for Efficient Vision Transformer Inference

Zhang, Haoyue, Zhang, Jie, Guo, Song

arXiv.org Artificial IntelligenceJul-23-2025

--Although vision transformers (ViT) have shown remarkable success in various vision tasks, their computationally expensive self-attention hinder their deployment on resource-constrained devices. T oken reduction, which discards less important tokens during forward propagation, has been proposed to enhance the efficiency of transformer models. However, existing methods handle unimportant tokens irreversibly, preventing their reuse in subsequent blocks. Considering that transformers focus on different information among blocks, tokens reduced in early blocks might be useful later . Furthermore, to adapt transformer models for resource-constrained devices, it is crucial to strike a balance between model performance and computational overhead. T o address these challenges, in this paper, we introduce a novel T oken Freezing and Reusing (T oFe) framework, where we identify important tokens at each stage and temporarily freeze the unimportant ones, allowing their lagged reusing at a later stage. Specifically, we design a prediction module for token identification and an approximate module for recovery of the frozen tokens. By jointly optimizing with the backbone through computation budget-aware end-to-end training, T oFe can adaptively process the necessary tokens at each block, thereby reducing computational cost while maintaining performance. Extensive experiments demonstrate that T oFe reduces the computational cost of L V-ViT model by 50% with less than 2% drop in T op-1 accuracy, achieving a better trade-off between performance and complexity compared to state-of-the-art methods. Large-scale pre-trained vision transformer (ViT) models [37] have achieved remarkable progress in the field of vision tasks.

large language model, machine learning, vision transformer, (21 more...)

arXiv.org Artificial Intelligence

2507.1626

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration

Zhang, Hanzhi, Fan, Heng, Sha, Kewei, Huang, Yan, Feng, Yunhe

arXiv.org Artificial IntelligenceJun-16-2025

Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined masks, failing to capture heterogeneous attention patterns. This results in suboptimal token interactions, limiting adaptability and retrieval accuracy in long-sequence tasks. This work introduces a dynamic sparse attention mechanism that assigns adaptive masks at the attention-map level, preserving heterogeneous patterns across layers and heads. Unlike existing approaches, our method eliminates the need for fine-tuning and predefined mask structures while maintaining computational efficiency. By learning context-aware attention structures, it achieves high alignment with full-attention models, ensuring minimal performance degradation while reducing memory and compute overhead. This approach provides a scalable alternative to full attention, enabling the practical deployment of large-scale Large Language Models (LLMs) without sacrificing retrieval performance. DAM is available at: https://github.com/HanzhiZhang-Ulrica/DAM.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.11104

Country:

North America > United States > Texas (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Convolutional Rectangular Attention Module

Nguyen, Hai-Vy, Gamboa, Fabrice, Zhang, Sixin, Chhaibi, Reda, Gratton, Serge, Giaccone, Thierry

arXiv.org Machine LearningMar-13-2025

In this paper, we introduce a novel spatial attention module, that can be integrated to any convolutional network. This module guides the model to pay attention to the most discriminative part of an image. This enables the model to attain a better performance by an end-to-end training. In standard approaches, a spatial attention map is generated in a position-wise fashion. We observe that this results in very irregular boundaries. This could make it difficult to generalize to new samples. In our method, the attention region is constrained to be rectangular. This rectangle is parametrized by only 5 parameters, allowing for a better stability and generalization to new samples. In our experiments, our method systematically outperforms the position-wise counterpart. Thus, this provides us a novel useful spatial attention mechanism for convolutional models. Besides, our module also provides the interpretability concerning the ``where to look" question, as it helps to know the part of the input on which the model focuses to produce the prediction.

attention map, attention module, information, (15 more...)

arXiv.org Machine Learning

2503.10875

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cross-Encoder Rediscovers a Semantic Variant of BM25

Lu, Meng, Chen, Catherine, Eickhoff, Carsten

arXiv.org Artificial IntelligenceFeb-6-2025

Neural Ranking Models (NRMs) have rapidly advanced state-of-the-art performance on information retrieval tasks. In this work, we investigate a Cross-Encoder variant of MiniLM to determine which relevance features it computes and where they are stored. We find that it employs a semantic variant of the traditional BM25 in an interpretable manner, featuring localized components: (1) Transformer attention heads that compute soft term frequency while controlling for term saturation and document length effects, and (2) a low-rank component of its embedding matrix that encodes inverse document frequency information for the vocabulary. This suggests that the Cross-Encoder uses the same fundamental mechanisms as BM25, but further leverages their capacity to capture semantics for improved retrieval performance. The granular understanding lays the groundwork for model editing to enhance model transparency, addressing safety concerns, and improving scalability in training and real-world applications.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.04645

Country: