attention state
RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models
Pan, Dayan, Wang, Jingyuan, Zhou, Yilong, Cheng, Jiawei, Jia, Pengyue, Zhao, Xiangyu
Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solution, but current approaches typically ignore the distinct roles of model components and the heterogeneous importance across layers, thereby limiting adaptation efficiency. Motivated by the observation that Rotary Position Embeddings (RoPE) induce critical activations in the low-frequency dimensions of attention states, we propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner. RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, which selectively enhances the low-frequency components of RoPE-influenced attention states, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm gradient norms. By combining dimension-wise enhancement with layer-wise adaptation, RoSA achieves more targeted and efficient fine-tuning. Extensive experiments on fifteen commonsense and arithmetic benchmarks demonstrate that RoSA outperforms existing mainstream PEFT methods under comparable trainable parameters.
Post-processing of EEG-based Auditory Attention Decoding Decisions via Hidden Markov Models
Heintz, Nicolas, Francart, Tom, Bertrand, Alexander
--Auditory attention decoding (AAD) algorithms exploit brain signals, such as electroencephalography (EEG), to identify which speaker a listener is focusing on in a multi-speaker environment. While state-of-the-art AAD algorithms can identify the attended speaker on short time windows, their predictions are often too inaccurate for practical use. In this work, we propose augmenting AAD with a hidden Markov model (HMM) that models the temporal structure of attention. More specifically, the HMM relies on the fact that a subject is much less likely to switch attention than to keep attending the same speaker at any moment in time. We show how a HMM can significantly improve existing AAD algorithms in both causal (real-time) and non-causal (offline) settings. We further demonstrate that HMMs outperform existing postprocessing approaches in both accuracy and responsiveness, and explore how various factors such as window length, switching frequency, and AAD accuracy influence overall performance. The proposed method is computationally efficient, intuitive to use and applicable in both real-time and offline settings. Accurately detecting to whom someone wishes to listen is of crucial importance for a wide array of applications. For example, this would allow a hearing aid to determine which speakers should be enhanced or suppressed [1]-[4]. This problem can potentially be solved by decoding the auditory attention from brain signals using electroencephalography (EEG) [5]-[9]. The most common and reliable method to decode attention from the neural response is based on stimulus reconstruction [3], [5]-[7], [10]. This method is based on the observation that the brain tracks attended speech more than unattended speech [11], [12]. The goal is to train a decoder that reconstructs the temporal variations in the attended speech signal (e.g., its amplitude envelope) from the EEG data.
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Yeo, Wei Jie, Mao, Rui, Abdar, Moloud, Cambria, Erik, Satapathy, Ranjan
Multimodal models like CLIP have gained significant attention due to their remarkable zero-shot performance across various tasks. However, studies have revealed that CLIP can inadvertently learn spurious associations between target variables and confounding factors. To address this, we introduce \textsc{Locate-Then-Correct} (LTC), a contrastive framework that identifies spurious attention heads in Vision Transformers via mechanistic insights and mitigates them through targeted ablation. Furthermore, LTC identifies salient, task-relevant attention heads, enabling the integration of discriminative features through orthogonal projection to improve classification performance. We evaluate LTC on benchmarks with inherent background and gender biases, achieving over a $>50\%$ gain in worst-group accuracy compared to non-training post-hoc baselines. Additionally, we visualize the representation of selected heads and find that the presented interpretation corroborates our contrastive mechanism for identifying both spurious and salient attention heads. Code available at https://github.com/wj210/CLIP_LTC.
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers
Gim, In, Li, Caihua, Zhong, Lin
Our work tackles the challenge of securing user inputs in cloud-hosted large language model (LLM) serving while ensuring output invariance, model confidentiality, and compute efficiency. We introduce secure multi-party decoding (SMD), which leverages confidential computing to confine user prompts to a trusted execution environment (TEE), namely a confidential virtual machine (CVM), while allowing service providers to generate tokens efficiently. We also introduce a novel cryptographic method, prompt obfuscation (PO), to ensure robustness against reconstruction attacks on SMD. We demonstrate that our approach preserves both prompt confidentiality and LLM serving efficiency. Our solution can enable privacy-preserving cloud LLM serving that handles sensitive prompts, such as clinical records, financial data, and personal information.
Improving How Agents Cooperate: Attention Schemas in Artificial Neural Networks
Farrell, Kathryn T., Ziman, Kirsten, Graziano, Michael S. A.
Growing evidence suggests that the brain uses an "attention schema" to monitor, predict, and help control attention. It has also been suggested that an attention schema improves social intelligence by allowing one person to better predict another. Given their potential advantages, attention schemas have been increasingly tested in machine learning. Here we test small deep learning networks to determine how the addition of an attention schema may affect performance on a range of tasks. First, we found that an agent with an attention schema is better at judging or categorizing the attention states of other agents. Second, we found that an agent with an attention schema develops a pattern of attention that is easier for other agents to judge and categorize. Third, we found that in a joint task where two agents paint a scene together and must predict each other's behavior for best performance, adding an attention schema improves that performance. Finally, we find that the performance improvements caused by an attention schema are not a non-specific result of an increase in network complexity. Not all performance, on all tasks, is improved. Instead, improvement is specific to "social" tasks involving judging, categorizing, or predicting the attention of other agents. These results suggest that an attention schema may be useful in machine learning for improving cooperativity and social behavior.
Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep
Sano, Akane, Amores, Judith, Czerwinski, Mary
We explore the application of large language models (LLMs), pre-trained models with massive textual data for detecting and improving these altered states. We investigate the use of LLMs to estimate attention states, sleep stages, and sleep quality and generate sleep improvement suggestions and adaptive guided imagery scripts based on electroencephalogram (EEG) and physical activity data (e.g. waveforms, power spectrogram images, numerical features). Our results show that LLMs can estimate sleep quality based on human textual behavioral features and provide personalized sleep improvement suggestions and guided imagery scripts; however detecting attention, sleep stages, and sleep quality based on EEG and activity data requires further training data and domain-specific knowledge.
When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference
Sun, Zhensu, Du, Xiaoning, Song, Fu, Wang, Shangwen, Li, Li
Leveraging recent advancements in large language models, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size poses challenges in terms of computational costs and environmental impact, hindering their widespread adoption in practical scenarios. Dynamic inference emerges as a promising solution, as it allocates minimal computation during inference while maintaining the model's performance. In this research, we explore dynamic inference within the context of code completion. Initially, we conducted an empirical investigation on GPT-2, focusing on the inference capabilities of intermediate layers for code completion. We found that 54.4% of tokens can be accurately generated using just the first layer, signifying significant computational savings potential. Moreover, despite using all layers, the model still fails to predict 14.5% of tokens correctly, and the subsequent completions continued from them are rarely considered helpful, with only a 4.2% Acceptance Rate. These findings motivate our exploration of dynamic inference in code completion and inspire us to enhance it with a decision-making mechanism that stops the generation of incorrect code. We thus propose a novel dynamic inference method specifically tailored for code completion models. This method aims not only to produce correct predictions with largely reduced computation but also to prevent incorrect predictions proactively. Our extensive evaluation shows that it can averagely skip 1.7 layers out of 16 layers in the models, leading to an 11.2% speedup with only a marginal 1.1% reduction in ROUGE-L.
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Mukherjee, Subhabrata, Awadallah, Ahmed Hassan, Gao, Jianfeng
While deep and large pre-trained models are the state-of-the-art for various natural language processing tasks, their huge size poses significant challenges for practical uses in resource constrained settings. Recent works in knowledge distillation propose task-agnostic as well as task-specific methods to compress these models, with task-specific ones often yielding higher compression rate. In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers that leverages the advantage of task-specific methods for learning a small universal model that can be applied to arbitrary tasks and languages. To this end, we study the transferability of several source tasks, augmentation resources and model architecture for distillation. We evaluate our model performance on multiple tasks, including the General Language Understanding Evaluation (GLUE) benchmark, SQuAD question answering dataset and a massive multi-lingual NER dataset with 41 languages. We release three distilled task-agnostic checkpoints with 13MM, 22MM and 33MM parameters obtaining SOTA performance in several tasks.
Epistemic Planning with Attention as a Bounded Resource
Belardinelli, Gaia, Rendsvig, Rasmus K.
Where information grows abundant, attention becomes a scarce resource. As a result, agents must plan wisely how to allocate their attention in order to achieve epistemic efficiency. Here, we present a framework for multi-agent epistemic planning with attention, based on Dynamic Epistemic Logic (DEL, a powerful formalism for epistemic planning). We identify the framework as a fragment of standard DEL, and consider its plan existence problem. While in the general case undecidable, we show that when attention is required for learning, all instances of the problem are decidable.
Seeing Machines Technology Enables GM Super Cruise Driver Assistance System
Seeing Machines (AIM: SEE), an industry leader in computer vision technologies which enable machines to see, understand and assist people, announces the automotive production debut of its FOVIO driver monitoring technology in the 2018 Cadillac CT6. The FOVIO based driver monitoring system (DMS) forms an integral part of General Motors' industry leading Super Cruise hands-free driving system for the highway, ensuring safe and confident vehicle operation. Overcoming the challenges of reliable driver monitoring is critical in hands-free driving systems to address the need for keeping drivers engaged and prepared to re-take control of the vehicle when required. The Cadillac Super Cruise system uses FOVIO vision technology, developed by Seeing Machines, to enable a gumdrop-sized infrared camera on the steering wheel column to accurately determine the driver's attention state. Determining driver attention state is accomplished through a precise measure of head orientation and eyelid movements under a full range of daytime and night-time driving conditions including the use of sunglasses.