Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Sorokin, Artyom, Buzun, Nazar, Anokhin, Alexander, Inozemcev, Oleg, Vedernikov, Egor, Anokhin, Petr, Burtsev, Mikhail, Alexey, Trushkov, Wenshuai, Yin, Burnaev, Evgeny
–arXiv.org Artificial Intelligence
Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Em-bedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks Babilong and RULER for contexts up to 10M tokens. Large language models (LLMs) have achieved impressive results across a wide range of tasks (Novikov et al., 2025; Guo et al., 2025; Y ang et al., 2025). However, they still face some several fundamental limitations such as static knowledge, computational inefficiency on long contexts, degraded performance caused by attention dilution, and hallucinations (Hsieh et al., 2024; Kuratov et al., 2024; Liu et al., 2025).
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- Asia
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Russia (0.04)
- Myanmar > Tanintharyi Region
- Europe
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Russia > Central Federal District
- Asia
- Genre:
- Research Report (1.00)
- Technology: