chain-of-action
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Zhang, Yuxiang, Yang, Yuqi, Shu, Jiangming, Wen, Xinyan, Sang, Jitao
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position Large Agent Models (LAMs) that internalize the generation of Chain-of-Action (CoA), enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce realenvironment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ OpenAI has outlined five progressive stages on the path to Artificial General Intelligence (AGI). The first stage, characterized as Chatbot, is exemplified by Large Language Models (LLMs) like GPT-3.5 and GPT-4 OpenAI (2023). The second stage, termed Reasoner, introduces Large Reasoning Models (LRMs) such as o1 OpenAI (2024) and o3. Recently, OpenAI released Operator OpenAI (2025a) and Deep Research OpenAI (2025b), signaling the arrival of the third stage: Agent. These systems reportedly combine reasoning with autonomous tool usage, enabling independent execution of multi-round workflows by interacting with the real-world environment. It is believed that the technology behind Operator and Deep Research is not merely integrating existing LLMs or LRMs with agentic workflows (e.g., ReAct Yao et al. (2022), Reflexion Shinn et al. (2023)). Instead, it represents a further upgrade in model capabilities: the new models are capable of long-term planning, tool manipulation, and environmental interaction.
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Pan, Zhenyu, Luo, Haozheng, Li, Manling, Liu, Han
We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.