Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering

Zhu, Rongzhi, Liu, Xiangyu, Sun, Zequn, Wang, Yiwei, Hu, Wei

Feb-19-2025–arXiv.org Artificial Intelligence

In this paper, we identify a critical problem, "lost-in-retrieval", in retrieval-augmented multi-hop question answering (QA): the key entities are missed in LLMs' sub-question decomposition. "Lost-in-retrieval" significantly degrades the retrieval performance, which disrupts the reasoning chain and leads to the incorrect answers. To resolve this problem, we propose a progressive retrieval and rewriting method, namely ChainRAG, which sequentially handles each sub-question by completing missing key entities and retrieving relevant sentences from a sentence graph for answer generation. Each step in our retrieval and rewriting process builds upon the previous one, creating a seamless chain that leads to accurate retrieval and answers. Finally, all retrieved sentences and sub-question answers are integrated to generate a comprehensive answer to the original question. We evaluate ChainRAG on three multi-hop QA datasets$\unicode{x2013}$MuSiQue, 2Wiki, and HotpotQA$\unicode{x2013}$using three large language models: GPT4o-mini, Qwen2.5-72B, and GLM-4-Plus. Empirical results demonstrate that ChainRAG consistently outperforms baselines in both effectiveness and efficiency.

chainrag, dataset, retrieval, (16 more...)

arXiv.org Artificial Intelligence

Feb-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Jiangsu Province (0.14)
  - Vietnam (0.30)
- North America > United States
  - California (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Large Language Model (1.00)
  - Question Answering (0.90)