Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

Qiu, Yifu, Embar, Varun, Zhang, Yizhe, Jaitly, Navdeep, Cohen, Shay B., Han, Benjamin

Jan-14-2025–arXiv.org Artificial Intelligence

Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.

icr 2, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jan-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States
    - California
      - Alameda County > Berkeley (0.04)
      - Los Angeles County > Los Angeles (0.14)
      - San Mateo County > Redwood City (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
    - Texas
      - Bexar County > San Antonio (0.04)
      - Dallas County > Dallas (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.46)
- Leisure & Entertainment (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.55)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found