Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

Liu, Jinyi, Yuan, Yifu, Hao, Jianye, Ni, Fei, Fu, Lingzhi, Chen, Yibin, Zheng, Yan

Feb-21-2024–arXiv.org Artificial Intelligence

Recently, there has been considerable attention towards leveraging large language models (LLMs) to enhance decision-making processes. However, aligning the natural language text instructions generated by LLMs with the vectorized operations required for execution presents a significant challenge, often necessitating task-specific details. To circumvent the need for such task-specific granularity, inspired by preference-based policy learning approaches, we investigate the utilization of multimodal LLMs to provide automated preference feedback solely from image inputs to guide decision-making. In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks, serving as a critic to offer analysis and preference feedback. Subsequently, we validate the effectiveness of preference labels generated by CriticGPT from a reward modeling perspective. Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks. Furthermore, performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.

criticgpt, learning, trajectory, (14 more...)

arXiv.org Artificial Intelligence

Feb-21-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)
- North America > United States
  - Nevada (0.04)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - California > Los Angeles County
    - Long Beach (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
- Asia
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)
  - China > Tianjin Province
    - Tianjin (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found