context awareness
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (6 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called Mixture of In-Context Experts (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy:(1) MoICE views each RoPE angle as an'in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- (6 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
On the loss of context-awareness in general instruction fine-tuning
Wang, Yihan, Bai, Andrew, Peng, Nanyun, Hsieh, Cho-Jui
Pre-trained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pre-training. In this paper, we investigate the loss of context awareness after SFT, where context awareness is defined as the ability to extract and understand information from user-provided context and respond accordingly. We are the first to identify and show that the loss of context awareness, as reflected by the performance drop in the Needle-in-a-Haystack test, occurs in instruction fine-tuned LLMs when the chat template is applied to input prompts. We identify that the performance decline is partially caused by an attention bias toward different roles learned during conversational instruction fine-tuning. We validate our hypothesis by visualizing changes in attention allocation after the chat template is applied and manually steering the attention heads. Based on these observations, we propose a metric to select context-dependent examples from general instruction fine-tuning datasets. We then apply conditional instruction fine-tuning with a context-dependency indicator, enabling the model to learn context awareness from these selected examples. Empirical experiments on four context-dependent downstream tasks and three pre-trained LLMs of different sizes show that our method effectively mitigates the loss of context awareness without compromising general instruction-following capabilities. Given our findings, we strongly advocate for careful benchmarking of context awareness after instruction fine-tuning.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Chen, Yuhan, Lv, Ang, Luan, Jian, Wang, Bin, Liu, Wei
Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we present empirical analyses on various PEs, demonstrating that models inherently learn attention with only a local-decay pattern while forming a U-shape pattern globally, contradicting the principle of long-term decay. Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and extrapolation.Inspired by these insights, we propose High-frequency rotary Position Encoding (HoPE). HoPE replaces the specific components in RoPE with position-independent ones, retaining only high-frequency signals, which also breaks the principle of long-term decay in theory. HoPE achieves two major advantages: (1) Without constraints imposed by long-term decay, contradictory factors that limit spontaneous attention optimization and model extrapolation performance are removed. (2) Components representing positions and semantics are are optimized. These enhances model's context awareness and extrapolation, as validated by extensive experiments.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
- Asia > China (0.04)
PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead
Tan, Tao, Qian, Yining, Lv, Ang, Lin, Hongzhan, Wu, Songhao, Wang, Yongbo, Wang, Feng, Wu, Jingtong, Lu, Xin, Yan, Rui
Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this paper, we propose Position-Embedding-Agnostic attention Re-weighting (PEAR), which enhances the context awareness of LLMs with zero inference overhead. Specifically, on a proxy task focused on context copying, we first detect heads which suppress the models' context awareness thereby diminishing RAG performance. To weaken the impact of these heads, we re-weight their outputs with learnable coefficients. The LLM (with frozen parameters) is optimized by adjusting these coefficients to minimize loss on the proxy task. As a result, the coefficients are optimized to values less than one, thereby reducing their tendency to suppress RAG performance. During inference, the optimized coefficients are fixed to re-weight these heads, regardless of the specific task at hand. Our proposed PEAR offers two major advantages over previous approaches: (1) It introduces zero additional inference overhead in terms of memory usage or inference time, while outperforming competitive baselines in accuracy and efficiency across various RAG tasks. (2) It is independent of position embedding algorithms, ensuring broader applicability.
- North America > United States (0.14)
- Europe > France (0.04)
- Asia > Singapore (0.04)
- (7 more...)
Connectivity-Inspired Network for Context-Aware Recognition
Carloni, Gianluca, Colantonio, Sara
The aim of this paper is threefold. We inform the AI practitioner about the human visual system with an extensive literature review; we propose a novel biologically motivated neural network for image classification; and, finally, we present a new plug-and-play module to model context awareness. We focus on the effect of incorporating circuit motifs found in biological brains to address visual recognition. Our convolutional architecture is inspired by the connectivity of human cortical and subcortical streams, and we implement bottom-up and top-down modulations that mimic the extensive afferent and efferent connections between visual and cognitive areas. Our Contextual Attention Block is simple and effective and can be integrated with any feed-forward neural network. It infers weights that multiply the feature maps according to their causal influence on the scene, modeling the co-occurrence of different objects in the image. We place our module at different bottlenecks to infuse a hierarchical context awareness into the model. We validated our proposals through image classification experiments on benchmark data and found a consistent improvement in performance and the robustness of the produced explanations via class activation.
- North America > United States (0.14)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Lin, Hongzhan, Lv, Ang, Chen, Yuhan, Zhu, Chen, Song, Yang, Zhu, Hengshu, Yan, Rui
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called ``Mixture of In-Context Experts'' (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an `in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
- Research Report > Promising Solution (0.66)
- Research Report > New Finding (0.46)
FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models
Zhu, Junyi, Liu, Shuochen, Yu, Yu, Tang, Bo, Yan, Yibo, Li, Zhiyu, Xiong, Feiyu, Xu, Tong, Blaschko, Matthew B.
Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before inference by fine-tuning only the last Feed-Forward Network (FFN) module. This targeted approach ensures efficient optimization without overfitting, significantly improving the model's ability to comprehend and accurately follow the context. Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures. For instance, FastMem improves the accuracy of Llama 3-8B-Inst on the NQ-SWAP dataset from 59.1% to 71.6%, and reduces the output structure failure rate of Qwen 1.5-4B-Chat from 34.9% to 25.5%. Extensive experimental results highlight FastMem's potential to offer a robust solution to enhance the reliability and accuracy of LLMs in various applications. Our code is available at: https://github.com/IAAR-Shanghai/FastMem
- Asia > China > Shanghai > Shanghai (0.24)
- North America > Dominican Republic (0.04)
- North America > United States > Utah (0.04)
- (8 more...)
- Leisure & Entertainment (1.00)
- Government > Military (1.00)
- Media > Film (0.68)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.63)
CATSE: A Context-Aware Framework for Causal Target Sound Extraction
Baligar, Shrishail, Kegler, Mikolaj, Irvin, Bryce, Stamenovic, Marko, Newsam, Shawn
Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE models suitable for real-time processing. First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user. Since the practical applications of oracle models are limited due to their assumptions, we introduce a composite multi-task training objective involving separation and classification losses. Our evaluation involving single- and multi-source extraction shows the benefit of using context information in the model either by means of providing full context or via the proposed multi-task training loss without the need for full context information. Specifically, we show that our proposed model outperforms size- and latency-matched Waveformer, a state-of-the-art model for real-time TSE.