Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning

Hu, Hanjiang, Liu, Changliu, Li, Na, Wang, Yebin

Dec-9-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) as autonomous agents are important in modern AI-based systems, which can perceive environments, reason about plans, and execute actions to interact with the environments [1]. Modern LLM agents demonstrate strong capabilities in knowledge integration, multi-step reasoning, and adaptive planning, as evidenced by their success in applications ranging from web search to robotic control [2, 3]. On top of these capabilities, prompt-based agentic frameworks [4-6] are proposed by integrating observation for environment state, reasoning based on augmented LLM with tools and memory, and action execution that interacts with the environment through structured interfaces as a series of single-turn interactions with the environments. However, effort-costly prompt engineering is inevitable to build the LLM-based agent, and it is also computationally expensive for test-time scaling in the multi-turn interaction with the environment [7, 8]. Therefore, training LLM agents through reinforcement learning (RL) for complex multi-turn task planning becomes a promising way to build effective agentic systems with low test-time cost [9-11]. However, current RL approaches face critical challenges when applied to multi-turn interactions with the environment for LLMs [12-15].

large language model, natural language, trajectory, (18 more...)

arXiv.org Artificial Intelligence

Dec-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)