Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

Oct-29-2025–arXiv.org Artificial Intelligence

Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-29-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.28)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:
- Research Report (0.51)

Industry:
- Law (0.36)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found