Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Hsieh, Cheng-Yu, Chuang, Yung-Sung, Li, Chun-Liang, Wang, Zifeng, Le, Long T., Kumar, Abhishek, Glass, James, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, Pfister, Tomas

Jul-3-2024–arXiv.org Artificial Intelligence

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-themiddle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit an U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless Figure 1: (a) Lost-in-the-middle refers to models' U-of their relevance. Second, we mitigate shape RAG performance as the relevant context's (e.g., this positional bias through a calibration a gold document containing the answer to a query) position mechanism, found-in-the-middle, that allows varies within the input; (b) We observe models the model to attend to contexts faithfully according exhibit U-shape attention weights favoring leading and to their relevance, even though when ending contexts, regardless of their actual contents; (c) they are in the middle. Third, we show foundin-the-middle Models do attend to relevant contexts even when placed not only achieves better performance in the middle, but are eventually distracted by leading/ending in locating relevant information within contexts; (d) We propose a calibration mechanism, a long context, but also eventually leads to improved found-in-the-middle, that disentangles the effect retrieval-augmented generation (RAG) of U-shape attention bias and allows models to attend performance across various tasks, outperforming to relevant context regardless their positions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-3-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America > Canada (0.14)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found