Transferring Foundation Models for Generalizable Robotic Manipulation

Yang, Jiange, Tan, Wenhui, Jin, Chuhao, Yao, Keling, Liu, Bei, Fu, Jianlong, Song, Ruihua, Wu, Gangshan, Wang, Limin

Oct-7-2023–arXiv.org Artificial Intelligence

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects, and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language grounded segmentation mask generated by Internet-scale foundation models, to address a wide range of pick-and-place robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which utilizes raw images, object masks, and robot proprioception to predict robot actions. Extensive real-world experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm. Demos are shown in YouTube (https://www.youtube.com/watch?v=MAcUPFBfRIw ).

arxiv preprint arxiv, background, manipulation, (15 more...)

arXiv.org Artificial Intelligence

Oct-7-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)
- North America > United States
  - Louisiana > Orleans Parish > New Orleans (0.04)
- Europe
  - Austria (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)
  - China
    - Jiangsu Province > Nanjing (0.04)
    - Hong Kong (0.04)
    - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found