In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Li, Zhuofeng, Zhang, Haoxiang, Han, Seungju, Liu, Sheng, Xie, Jianwen, Zhang, Yu, Choi, Yejin, Zou, James, Lu, Pan

Oct-8-2025–arXiv.org Artificial Intelligence

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the live dynamics of multi-turn interaction. We introduce AgentFlow, a trainable, in-the-flow agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. To train on-policy in live environments, we propose Flow-based Group Refined Policy Optimization (Flow-GRPO), which tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. Across ten benchmarks, AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks, even surpassing larger proprietary models like GPT-4o. Further analyses confirm the benefits of in-the-flow optimization, showing improved planning, enhanced tool-calling reliability, and positive scaling with model size and reasoning turns.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-8-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)
- North America > United States (0.27)

Genre:
- Research Report > New Finding (0.45)

Industry:
- Leisure & Entertainment (1.00)
- Media > Film (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found