AITopics | attention state

Collaborating Authors

attention state

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

Pan, Dayan, Wang, Jingyuan, Zhou, Yilong, Cheng, Jiawei, Jia, Pengyue, Zhao, Xiangyu

arXiv.org Artificial IntelligenceDec-1-2025

Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solution, but current approaches typically ignore the distinct roles of model components and the heterogeneous importance across layers, thereby limiting adaptation efficiency. Motivated by the observation that Rotary Position Embeddings (RoPE) induce critical activations in the low-frequency dimensions of attention states, we propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner. RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, which selectively enhances the low-frequency components of RoPE-influenced attention states, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm gradient norms. By combining dimension-wise enhancement with layer-wise adaptation, RoSA achieves more targeted and efficient fine-tuning. Extensive experiments on fifteen commonsense and arithmetic benchmarks demonstrate that RoSA outperforms existing mainstream PEFT methods under comparable trainable parameters.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.21733

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models

Heintz, Nicolas, Francart, Tom, Bertrand, Alexander

arXiv.org Artificial IntelligenceJul-1-2025

--Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings. Accurately detecting to whom someone wishes to listen is of crucial importance for a wide array of applications. For example, this would allow a hearing aid to determine which speakers should be enhanced or suppressed [1]-[4]. This problem can potentially be solved by decoding the auditory attention from brain signals using electroencephalography (EEG) [5]-[9]. The most common and reliable method to decode attention from the neural response is based on stimulus reconstruction [3], [5]-[7], [10]. This method is based on the observation that the brain tracks attended speech more than unattended speech [11], [12]. The goal is to train a decoder that reconstructs the temporal variations in the attended speech signal (e.g., its amplitude envelope) from the EEG data.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.24024

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads

Yeo, Wei Jie, Mao, Rui, Abdar, Moloud, Cambria, Erik, Satapathy, Ranjan

arXiv.org Artificial IntelligenceMay-26-2025

Multimodal models like CLIP have gained significant attention due to their remarkable zero-shot performance across various tasks. However, studies have revealed that CLIP can inadvertently learn spurious associations between target variables and confounding factors. To address this, we introduce \textsc{Locate-Then-Correct} (LTC), a contrastive framework that identifies spurious attention heads in Vision Transformers via mechanistic insights and mitigates them through targeted ablation. Furthermore, LTC identifies salient, task-relevant attention heads, enabling the integration of discriminative features through orthogonal projection to improve classification performance. We evaluate LTC on benchmarks with inherent background and gender biases, achieving over a $>50\%$ gain in worst-group accuracy compared to non-training post-hoc baselines. Additionally, we visualize the representation of selected heads and find that the presented interpretation corroborates our contrastive mechanism for identifying both spurious and salient attention heads. Code available at https://github.com/wj210/CLIP_LTC.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.17425

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Confidential Prompting: Protecting User Prompts from Cloud LLM Providers

Gim, In, Li, Caihua, Zhong, Lin

arXiv.org Artificial IntelligenceNov-28-2024

Our work tackles the challenge of securing user inputs in cloud-hosted large language model (LLM) serving while ensuring output invariance, model confidentiality, and compute efficiency. We introduce secure multi-party decoding (SMD), which leverages confidential computing to confine user prompts to a trusted execution environment (TEE), namely a confidential virtual machine (CVM), while allowing service providers to generate tokens efficiently. We also introduce a novel cryptographic method, prompt obfuscation (PO), to ensure robustness against reconstruction attacks on SMD. We demonstrate that our approach preserves both prompt confidentiality and LLM serving efficiency. Our solution can enable privacy-preserving cloud LLM serving that handles sensitive prompts, such as clinical records, financial data, and personal information.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.19134

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)
Information Technology > Services (0.93)
Health & Medicine > Health Care Technology > Medical Record (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks

Farrell, Kathryn T., Ziman, Kirsten, Graziano, Michael S. A.

arXiv.org Artificial IntelligenceNov-1-2024

Growing evidence suggests that the brain uses an "attention schema" to monitor, predict, and help control attention. It has also been suggested that an attention schema improves social intelligence by allowing one person to better predict another. Given their potential advantages, attention schemas have been increasingly tested in machine learning. Here we test small deep learning networks to determine how the addition of an attention schema may affect performance on a range of tasks. First, we found that an agent with an attention schema is better at judging or categorizing the attention states of other agents. Second, we found that an agent with an attention schema develops a pattern of attention that is easier for other agents to judge and categorize. Third, we found that in a joint task where two agents paint a scene together and must predict each other's behavior for best performance, adding an attention schema improves that performance. Finally, we find that the performance improvements caused by an attention schema are not a non-specific result of an increase in network complexity. Not all performance, on all tasks, is improved. Instead, improvement is specific to "social" tasks involving judging, categorizing, or predicting the attention of other agents. These results suggest that an attention schema may be useful in machine learning for improving cooperativity and social behavior.

artificial intelligence, attention schema, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.00983

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep

Sano, Akane, Amores, Judith, Czerwinski, Mary

arXiv.org Artificial IntelligenceAug-1-2024

We explore the application of large language models (LLMs), pre-trained models with massive textual data for detecting and improving these altered states. We investigate the use of LLMs to estimate attention states, sleep stages, and sleep quality and generate sleep improvement suggestions and adaptive guided imagery scripts based on electroencephalogram (EEG) and physical activity data (e.g. waveforms, power spectrogram images, numerical features). Our results show that LLMs can estimate sleep quality based on human textual behavioral features and provide personalized sleep improvement suggestions and guided imagery scripts; however detecting attention, sleep stages, and sleep quality based on EEG and activity data requires further training data and domain-specific knowledge.

llm, participant, sleep quality, (16 more...)

arXiv.org Artificial Intelligence

2408.07822

Country:

North America > United States (0.05)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference

Sun, Zhensu, Du, Xiaoning, Song, Fu, Wang, Shangwen, Li, Li

arXiv.org Artificial IntelligenceJan-18-2024

Leveraging recent advancements in large language models, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size poses challenges in terms of computational costs and environmental impact, hindering their widespread adoption in practical scenarios. Dynamic inference emerges as a promising solution, as it allocates minimal computation during inference while maintaining the model's performance. In this research, we explore dynamic inference within the context of code completion. Initially, we conducted an empirical investigation on GPT-2, focusing on the inference capabilities of intermediate layers for code completion. We found that 54.4% of tokens can be accurately generated using just the first layer, signifying significant computational savings potential. Moreover, despite using all layers, the model still fails to predict 14.5% of tokens correctly, and the subsequent completions continued from them are rarely considered helpful, with only a 4.2% Acceptance Rate. These findings motivate our exploration of dynamic inference in code completion and inspire us to enhance it with a decision-making mechanism that stops the generation of incorrect code. We thus propose a novel dynamic inference method specifically tailored for code completion models. This method aims not only to produce correct predictions with largely reduced computation but also to prevent incorrect predictions proactively. Our extensive evaluation shows that it can averagely skip 1.7 layers out of 16 layers in the models, leading to an 11.2% speedup with only a marginal 1.1% reduction in ROUGE-L.

code completion, completion, lcm, (14 more...)

arXiv.org Artificial Intelligence

2401.09964

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation

Mukherjee, Subhabrata, Awadallah, Ahmed Hassan, Gao, Jianfeng

arXiv.org Artificial IntelligenceJun-11-2021

While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings. Recent works in knowledge distillation propose task-agnostic as well as task-specific methods to compress these models, with task-specific ones often yielding higher compression rate. In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers that leverages the advantage of task-specific methods for learning a small universal model that can be applied to arbitrary tasks and languages. To this end, we study the transferability of several source tasks, augmentation resources and model architecture for distillation. We evaluate our model performance on multiple tasks, including the General Language Understanding Evaluation (GLUE) benchmark, SQuAD question answering dataset and a massive multi-lingual NER dataset with 41 languages. We release three distilled task-agnostic checkpoints with 13MM, 22MM and 33MM parameters obtaining SOTA performance in several tasks.

computational linguistic, distillation, xtremedistiltransformer, (15 more...)

arXiv.org Artificial Intelligence

2106.04563

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Europe > Italy > Tuscany > Florence (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Epistemic Planning with Attention as a Bounded Resource

Belardinelli, Gaia, Rendsvig, Rasmus K.

arXiv.org Artificial IntelligenceMay-20-2021

Where information grows abundant, attention becomes a scarce resource. As a result, agents must plan wisely how to allocate their attention in order to achieve epistemic efficiency. Here, we present a framework for multi-agent epistemic planning with attention, based on Dynamic Epistemic Logic (DEL, a powerful formalism for epistemic planning). We identify the framework as a fragment of standard DEL, and consider its plan existence problem. While in the general case undecidable, we show that when attention is required for learning, all instances of the problem are decidable.

attention state, def, iff, (15 more...)

arXiv.org Artificial Intelligence

2105.09976

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Seeing Machines Technology Enables GM Super Cruise Driver Assistance System

#artificialintelligenceOct-23-2017, 03:10:21 GMT

Seeing Machines (AIM: SEE), an industry leader in computer vision technologies which enable machines to see, understand and assist people, announces the automotive production debut of its FOVIO driver monitoring technology in the 2018 Cadillac CT6. The FOVIO based driver monitoring system (DMS) forms an integral part of General Motors' industry leading Super Cruise hands-free driving system for the highway, ensuring safe and confident vehicle operation. Overcoming the challenges of reliable driver monitoring is critical in hands-free driving systems to address the need for keeping drivers engaged and prepared to re-take control of the vehicle when required. The Cadillac Super Cruise system uses FOVIO vision technology, developed by Seeing Machines, to enable a gumdrop-sized infrared camera on the steering wheel column to accurately determine the driver's attention state. Determining driver attention state is accomplished through a precise measure of head orientation and eyelid movements under a full range of daytime and night-time driving conditions including the use of sunglasses.

artificial intelligence, hand-free driving system, super cruise driver assistance system, (10 more...)

#artificialintelligence

Country: Oceania > Australia (0.20)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Transportation > Ground > Road (0.93)

Technology: Information Technology > Artificial Intelligence > Vision (0.38)

Add feedback