Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models
Liu, Jinyi, Yuan, Yifu, Hao, Jianye, Ni, Fei, Fu, Lingzhi, Chen, Yibin, Zheng, Yan
–arXiv.org Artificial Intelligence
Recently, there has been considerable attention towards leveraging large language models (LLMs) to enhance decision-making processes. However, aligning the natural language text instructions generated by LLMs with the vectorized operations required for execution presents a significant challenge, often necessitating task-specific details. To circumvent the need for such task-specific granularity, inspired by preference-based policy learning approaches, we investigate the utilization of multimodal LLMs to provide automated preference feedback solely from image inputs to guide decision-making. In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks, serving as a critic to offer analysis and preference feedback. Subsequently, we validate the effectiveness of preference labels generated by CriticGPT from a reward modeling perspective. Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks. Furthermore, performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.
arXiv.org Artificial Intelligence
Feb-21-2024
- Country:
- Africa > Rwanda
- Asia
- China > Tianjin Province
- Tianjin (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- China > Tianjin Province
- Europe
- Sweden > Stockholm
- Stockholm (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Sweden > Stockholm
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Nevada (0.04)
- California > Los Angeles County
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- Genre:
- Research Report > New Finding (0.88)
- Technology: