REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation

Yuan, Puzhen, Ma, Angyuan, Yao, Yunchao, Yao, Huaxiu, Tomizuka, Masayoshi, Ding, Mingyu

Mar-27-2025–arXiv.org Artificial Intelligence

-- Vision-language models (VLMs) have demonstrated remarkable capabilities in robotic planning, particularly for long-horizon tasks that require a holistic understanding of the environment for task decomposition. Existing methods typically rely on prior environmental knowledge or carefully designed task-specific prompts, making them struggle with dynamic scene changes or unexpected task conditions, e.g., a robot attempting to put a carrot in the microwave but finds the door was closed. Such challenges underscore two critical issues: adaptability and efficiency. T o address them, in this work, we propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution through continuous reflection and self-evolution. REMAC incorporates two key modules: a self-reflection module performing pre-condition and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning. It offers several appealing benefits: 1) Robots can initially explore and reason about the environment without complex prompt design. T o validate REMAC's effectiveness, we build a multi-agent environment for long-horizon robot manipulation and navigation based on RoboCasa, featuring 4 task categories with 27 task styles and 50+ different objects. Based on it, we further benchmark state-of-the-art reasoning models, including DeepSeek-R1, o3-mini, QwQ, and Grok3, demonstrating REMAC's superiority by boosting average success rates by 40% and execution efficiency by 52.7% over the single robot baseline without any task-specific prompting or finetuning. In recent years, Vision-Language Models (VLMs) have seen significant application in robot control tasks [1, 2].

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Japan
  - Shikoku > Kagawa Prefecture > Takamatsu (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Robot Planning & Action (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found