Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Lan, Junwei, Chen, Jianlyu, Liu, Zheng, Li, Chaofan, Bao, Siqi, Lian, Defu
–arXiv.org Artificial Intelligence
With the growing popularity of LLM agents and RAG, it has become increasingly important to retrieve documents that are essential for solving a task, even when their connection to the task is indirect or implicit. Addressing this problem requires fine-grained reasoning to accurately assess the relevance between the task and each candidate document. This capability, however, poses a significant challenge for existing IR techniques. Despite recent progress in reasoning-enhanced IR, existing approaches still face significant challenges in applicability, scalability, and efficiency. In this work, we propose Retro*, a novel approach for reasoning-intensive document retrieval. Our method introduces a rubric-based relevance scoring mechanism, enabling the model to reason about the relationship between a task and a document based on explicitly defined criteria, whereby producing a fine-grained, interpretable relevance score. Retro* also supports test-time scaling by combining multiple reasoning trajectories via score integration, which produces more reliable relevance estimates. To optimize Retro*'s reasoning capabilities, we introduce a novel reinforcement learning algorithm tailored for its relevance scoring mechanism, which employs two composite rewards to fully exploit the trajectories of each training sample. Our experiments show that Retro* outperforms existing document retrieval methods with notable advantages, leading to state-of-the-art performance on the BRIGHT benchmark. Large language model (LLM) agents have become increasingly important for tackling complex tasks such as software engineering, mathematics, and scientific research (Chan et al., 2024; Jin et al., 2025; Wei et al., 2025; Phan et al., 2025). In these applications, retrieval-augmented generation (RAG) (Lewis et al., 2020; Gao et al., 2023) plays a crucial role, as access to external knowledge is often necessary to produce high-quality solutions. However, in many scenarios, retrieval models must identify useful documents, even when their connection to the task is indirect or implicit, which makes the retrieval process particularly challenging. For example, in software engineering, a retrieval model may need to locate programs that share similar design patterns with the target problem rather than matching exact code snippets (Jimenez et al., 2023). In mathematics, it might involve retrieving proofs derived from the same underlying theorem, even if they are expressed differently (Chen et al., 2023). Solving such tasks requires fine-grained reasoning to bridge subtle connections between the task and candidate documents. However, existing retrieval models are primarily designed to capture straightforward semantic relationships, such as matching question-answer pairs or identifying paraphrases (Lee et al., 2019; Karpukhin et al., 2020).
arXiv.org Artificial Intelligence
Oct-14-2025