Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents
Wang, Yiding, Wei, Zhepei, Zhu, Xinyu, Meng, Yu
–arXiv.org Artificial Intelligence
Enabling large language models (LLMs) to utilize search tools offers a promising path to overcoming fundamental limitations such as knowledge cutoffs and hallucinations. Recent work has explored reinforcement learning (RL) for training search-augmented agents that interleave reasoning and retrieval before answering. These approaches usually rely on outcome-based rewards (e.g., exact match), implicitly assuming that optimizing for final answers will also yield effective intermediate search behaviors. Our analysis challenges this assumption: we uncover multiple systematic deficiencies in search that arise under outcome-only training and ultimately degrade final answer quality, including failure to invoke tools, invalid queries, and redundant searches. To address these shortcomings, we introduce DeSA (Decoupling Search-and-Answering), a simple two-stage training framework that explicitly separates search optimization from answer generation. In Stage 1, agents are trained to improve search effectiveness with retrieval recall-based rewards. In Stage 2, outcome rewards are employed to optimize final answer generation. Across seven QA benchmarks, DeSA-trained agents consistently improve search behaviors, delivering substantially higher search recall and answer accuracy than outcome-only baselines. Notably, DeSA outperforms single-stage training approaches that simultaneously optimize recall and outcome rewards, underscoring the necessity of explicitly decoupling the two objectives.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Spain
- Catalonia > Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Illinois > Champaign County
- Urbana (0.04)
- Virginia (0.04)
- Illinois > Champaign County
- Canada > Ontario
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Technology: