From Evidence to Trajectory: Abductive Reasoning Path Synthesis for Training Retrieval-Augmented Generation Agents
Li, Muzhi, Qi, Jinhu, Wu, Yihong, Zhao, Minghao, Ma, Liheng, Li, Yifan, Wang, Xinyu, Zhang, Yingxue, Leung, Ho-fung, King, Irwin
–arXiv.org Artificial Intelligence
Retrieval-augmented generation agents development is hindered by the lack of process-level supervision to effectively guide agentic capabilities like task decomposition, retriever invocation, and stepwise decision-making. While reinforcement learning offers a potential solution, it suffers from sparse rewards and the limited reasoning capabilities of large language models (LLMs). Meanwhile, existing data synthesis methods only produce chain-of-thought rationales and fail to model environmental interactions. In this paper, we propose EviPath, an evidence-anchored reasoning path synthesis paradigm for RAG agent development. EviPath comprises: (i) Abductive Subtask Planning, which decomposes the problem into sub-questions and iteratively plans an optimal solution path based on the dependencies between them; (ii) Faithful Sub-question Answering, which uses supporting evidence to construct a proxy environment to generate reasoning thoughts and answers for each sub-question; and (iii) Conversational Fine-Tuning, which formats the complete agent-environment interaction trajectory into a dialogue format suitable for Supervised Fine-Tuning. EviPath allows LLMs to learn complex reasoning and tool-use capabilities directly from synthesized data. Extensive experiments on widely-used question-answering benchmarks show that an 8B parameter model trained with EviPath-synthesized data significantly and consistently outperforms state-of-the-art baselines with a double-digit absolute EM gain of 14.7% in open-domain question answering. Retrieval-augmented generation (RAG) agents, powered by large language models (LLMs) (Guo et al., 2025), can autonomously gather external knowledge and answer complex, multi-hop questions. Compared to vanilla RAG systems (Lewis et al., 2020), RAG agents minimize the need for human intervention, and adapt readily to downstream applications like math problem solving (Zhu et al., 2025), code generation (Zhang et al., 2023), and financial analysis (Wang et al., 2025c). Despite their promise, RAG agents are hard to develop since ground truth reasoning trajectories are unavailable. Mainstream multi-hop question answering datasets Y ang et al. (2018); Ho et al. (2020); Trivedi et al. (2022) provide final answers and supporting facts, while lacking step-wise supervision that is crucial to equip LLMs with agentic behaviors like question decomposition, search query reformulation, and plan refinement.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States (0.68)
- Genre:
- Research Report > New Finding (0.93)
- Technology: