LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

Zhao, Qingfei, Wang, Ruobing, Cen, Yukuo, Zha, Daren, Tan, Shicheng, Dong, Yuxiao, Tang, Jie

Nov-1-2024–arXiv.org Artificial Intelligence

Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context Large Language Models (LLMs) for LCQA often struggle with the "lost in the middle" issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG's understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive experiments on three multi-hop datasets demonstrate that LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%). Furthermore, we conduct quantitative ablation studies and multi-dimensional analyses, highlighting the effectiveness of the system's components and fine-tuning strategies. Data and code are available at https://github.com/QingFei1/LongRAG.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Nov-1-2024

arXiv.org PDF

Add feedback

Country:
- Africa > Rwanda
  - Kigali > Kigali (0.04)
- Asia
  - Bangladesh (0.04)
  - China > Beijing
    - Beijing (0.04)
  - Singapore (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
- Europe
  - Austria (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Hungary (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Italy > Piedmont
    - Turin Province > Turin (0.04)
  - Norway (0.04)
  - Spain
    - Aragón (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States
    - California
      - Los Angeles County > Los Angeles (0.14)
      - San Joaquin County > Tracy (0.04)
    - Georgia > Bibb County
      - Macon (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Illinois (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Leisure & Entertainment (1.00)
- Media
  - Film (1.00)
  - Music (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found