Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Hu, Hanjiang, Liu, Changliu, Li, Na, Wang, Yebin
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) as autonomous agents are important in modern AI-based systems, which can perceive environments, reason about plans, and execute actions to interact with the environments [1]. Modern LLM agents demonstrate strong capabilities in knowledge integration, multi-step reasoning, and adaptive planning, as evidenced by their success in applications ranging from web search to robotic control [2, 3]. On top of these capabilities, prompt-based agentic frameworks [4-6] are proposed by integrating observation for environment state, reasoning based on augmented LLM with tools and memory, and action execution that interacts with the environment through structured interfaces as a series of single-turn interactions with the environments. However, effort-costly prompt engineering is inevitable to build the LLM-based agent, and it is also computationally expensive for test-time scaling in the multi-turn interaction with the environment [7, 8]. Therefore, training LLM agents through reinforcement learning (RL) for complex multi-turn task planning becomes a promising way to build effective agentic systems with low test-time cost [9-11]. However, current RL approaches face critical challenges when applied to multi-turn interactions with the environment for LLMs [12-15].
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia > China (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Massachusetts > Middlesex County
- Genre:
- Research Report (0.64)
- Technology: