PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models

Lykov, Artem, Sam, Jeffrin, Nguyen, Hung Khang, Kozlovskiy, Vladislav, Mahmoud, Yara, Serpiva, Valerii, Cabrera, Miguel Altamirano, Konenkov, Mikhail, Tsetserukou, Dzmitry

Sep-18-2025–arXiv.org Artificial Intelligence

Abstract-- We introduce PhysicalAgent, an agentic framework for robotic manipulation that integrates iterative reasoning, diffusion-based video generation, and closed-loop execution. Given a textual instruction, our method generates short video demonstrations of candidate trajectories, executes them on the robot, and iteratively re-plans in response to failures. This approach enables robust recovery from execution errors. We evaluate PhysicalAgent across multiple perceptual modalities (egocentric, third-person, and simulated) and robotic embodiments (bimanual UR3, Unitree G1 humanoid, simulated GR1), comparing against state-of-the-art task-specific baselines. Experiments demonstrate that our method consistently outperforms prior approaches, achieving up to 83% success on human-familiar tasks. Physical trials reveal that first-attempt success is limited (20-30%), yet iterative correction increases overall success to 80% across platforms. These results highlight the potential of video-based generative reasoning for general-purpose robotic manipulation and underscore the importance of iterative execution for recovering from initial failures. Our framework paves the way for scalable, adaptable, and robust robot control. The rapid progress of large foundation models has transformed the design of AI agents.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Sep-18-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.68)

Industry:
- Energy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found