AITopics

Technology: Information Technology > Artificial Intelligence (1.00)

Bilehsavar, Meysam Shirdel, Ghaedi, Razieh, Taheri, Samira Seyed, Fan, Xinqi, O'Reilly, Christian

SACA: Selective Attention-Based Clustering Algorithm

arXiv.org Artificial IntelligenceDec-1-2025

Clustering algorithms are fundamental tools across many fields, with density-based methods offering particular advantages in identifying arbitrarily shaped clusters and handling noise. However, their effectiveness is often limited by the requirement of critical parameter tuning by users, which typically requires significant domain expertise. This paper introduces a novel density-based clustering algorithm loosely inspired by the concept of selective attention, designed to minimize reliance on parameter tuning for most applications. The proposed method computes an adaptive threshold to exclude sparsely distributed points and outliers, constructs an initial cluster framework, and subsequently reintegrates the filtered points to refine the final results. Extensive experiments on diverse benchmark datasets demonstrate the robustness, accuracy, and ease of use of the proposed approach, establishing it as a powerful alternative to conventional density-based clustering techniques.

algorithm, artificial intelligence, machine learning, (18 more...)

2508.1715

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsNov-21-2025, 16:07:01 GMT

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

The past decade has seen a revolution in genomic technologies that enabled a flood of genome-wide profiling of chromatin marks. Recent literature tried to understand gene regulation by predicting gene expression from large-scale chromatin measurements. Two fundamental challenges exist for such learning tasks: (1) genome-wide chromatin signals are spatially structured, high-dimensional and highly modular; and (2) the core aim is to understand what are the relevant factors and how they work together. Previous studies either failed to model complex dependencies among input signals or relied on separate feature analysis to explain the decisions. This paper presents an attention-based deep learning approach; AttentiveChrome, that uses a unified architecture to model and to interpret dependencies among chromatin factors for controlling gene regulation. AttentiveChrome uses a hierarchy of multiple Long Short-Term Memory (LSTM) modules to encode the input signals and to model how various chromatin marks cooperate automatically. AttentiveChrome trains two levels of attention jointly with the target prediction, enabling it to attend differentially to relevant marks and to locate important positions per mark. We evaluate the model across 56 different cell types (tasks) in human. Not only is the proposed architecture more accurate, but its attention scores also provide a better interpretation than state-of-the-art feature visualization methods such as saliency map.

gene regulation, name change, selective attention, (5 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-4-2025

Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies

Hu, Yuxuan, Tan, Jianchao, Zhang, Jiaqi, Zan, Wen, Sun, Pingwei, Lu, Yifan, Sun, Yerui, Xie, Yuchen, Cai, Xunliang, Zhang, Jing

In this work, we conduct a systematic analysis of Native Sparse Attention (NSA) and propose targeted improvements that enhance long-context modeling. A key insight is that alternating between local (sliding-window) and global (compression, selective) attention across layers, rather than using fixed patterns, enables more effective propagation of long-range dependencies and substantially boosts performance on long-sequence tasks. Meanwhile, we further refine NSA's branches with Latent Attention that the sliding-window branch is enhanced with Multi-head Latent Attention (MLA) while compression and selective branches adopt Group-head Latent Attention (GLA). These changes reduce KV-cache memory by 50\% versus NSA while improving the model's common-sense reasoning and long-text understanding capabilities. Experiments on models from 340M to 1.3B parameters (trained on 15B and 100B tokens) show our method matches or exceeds full attention and native sparse attention in both common-sense reasoning and long-context understanding tasks.

artificial intelligence, machine learning, preprint, (19 more...)

2511.00819

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsOct-9-2025, 19:09:54 GMT

Selective Attention: Enhancing Transformer through Principled Context Control

The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query.

arxiv preprint arxiv, attention map, experiment, (14 more...)

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-26-2025, 16:56:54 GMT

Selective Attention: Enhancing Transformer through Principled Context Control

large language model, machine learning, principled context control, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.42)

arXiv.org Artificial IntelligenceFeb-28-2025

A Method of Selective Attention for Reservoir Based Agents

McKee, Kevin

Training of deep reinforcement learning agents is slowed considerably by the presence of input dimensions that do not usefully condition the reward function. Existing modules such as layer normalization can be trained with weight decay to act as a form of selective attention, i.e. an input mask, that shrinks the scale of unnecessary inputs, which in turn accelerates training of the policy. However, we find a surprising result that adding numerous parameters to the computation of the input mask results in much faster training. A simple, high dimensional masking module is compared with layer normalization and a model without any input suppression. The high dimensional mask resulted in a four-fold speedup in training over the null hypothesis and a two-fold speedup in training over the layer normalization method.

efficiency, layer normalization, selective attention, (12 more...)

2502.21229

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceNov-19-2024

Selective Attention: Enhancing Transformer through Principled Context Control

Zhang, Xuechen, Chang, Xiangyu, Li, Mingchen, Roy-Chowdhury, Amit, Chen, Jiasi, Oymak, Samet

The attention mechanism within the transformer architecture enables the model to weigh and combine tokens based on their relevance to the query. While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$ are the value and key embeddings respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. As a solution, we introduce the $\textit{Selective Self-Attention}$ (SSA) layer that augments the softmax nonlinearity with a principled temperature scaling strategy. By controlling temperature, SSA adapts the contextual sparsity of the attention map to the query embedding and its position in the context window. Through theory and experiments, we demonstrate that this alleviates attention dilution, aids the optimization process, and enhances the model's ability to control softmax spikiness of individual queries. We also incorporate temperature scaling for value embeddings and show that it boosts the model's ability to suppress irrelevant/noisy tokens. Notably, SSA is a lightweight method which introduces less than 0.5% new parameters through a weight-sharing strategy and can be fine-tuned on existing LLMs. Extensive empirical evaluations demonstrate that SSA-equipped models achieve a noticeable and consistent accuracy improvement on language modeling benchmarks.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2411.12892

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-8-2024, 10:07:17 GMT

Reviews: Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

The paper presents a novel method for predicting gene regulation by LSTM with an attention mechanism. The model consists of two levels, where the first level is applied on bins for each histone modifications (HM) and the second level is applied to multiple HMs. Attention mechanism is used in each level to focus on the important parts of the bins and HMs. In the experiments, the proposed method improves AUC scores over baseline models including CNN, LSTM, and CNN with an attention mechanism. This is an interesting paper which shows that LSTM with an attention mechanism can predict gene regulation.

attention mechanism, gene regulation, selective attention, (10 more...)

Genre: Research Report (0.45)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Leviathan, Yaniv, Kalman, Matan, Matias, Yossi

Selective Attention Improves Transformer

arXiv.org Artificial IntelligenceOct-3-2024

We introduce Selective Attention, a simple parameter-free change to the standard attention mechanism which reduces attention to unneeded elements. Selective attention improves language modeling performance in a variety of model sizes and context lengths. For example, a range of transformers trained with the language modeling objective on C4 with selective attention perform equivalently to standard transformers with 2X more heads and parameters in their attention modules. Selective attention also allows decreasing the size of the attention's context buffer, leading to meaningful reductions in the memory and compute requirements during inference. For example, transformers with 100M parameters trained on C4 with context sizes of 512, 1,024, and 2,048 need 16X, 25X, and 47X less memory for their attention module, respectively, when equipped with selective attention, as those without selective attention, with the same validation perplexity. Different tasks have different memory requirements. On one extreme, copying an arbitrary sequence requires retaining all sequence elements in memory. On the other extreme, determining whether a specific element appeared at least once, only requires persisting a constant amount of memory. Transformers (Vaswani et al., 2017) keep the entire history in their context buffers, allowing them to solve tasks such as copying, while famously leading to their squared attention cost. RNNs (Rumelhart et al., 1986) and their modern structured state space variants (Gu et al., 2022; Gu & Dao, 2024) keep only a constant-sized sketch of the history, making inference cost linear, but rendering them incapable of solving tasks such as arbitrary string copying.

arxiv, selective attention, transformer, (13 more...)

2410.02703

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.54)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)