VIMA: General Robot Manipulation with Multimodal Prompts
Jiang, Yunfan, Gupta, Agrim, Zhang, Zichen, Wang, Guanzhi, Dou, Yongqiang, Chen, Yanjun, Fei-Fei, Li, Anandkumar, Anima, Zhu, Yuke, Fan, Linxi
–arXiv.org Artificial Intelligence
Prompt-based learning has emerged as a successful paradigm in natural language processing, where a single general-purpose language model can be instructed to perform any task specified by input prompts. Yet task specification in robotics comes in various forms, such as imitating one-shot demonstrations, following language instructions, and reaching visual goals. They are often considered different tasks and tackled by specialized models. We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens. Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively. VIMA features a recipe that achieves strong model scalability and data efficiency. It outperforms alternative designs in the hardest zero-shot generalization setting by up to $2.9\times$ task success rate given the same training data. With $10\times$ less training data, VIMA still performs $2.7\times$ better than the best competing variant. Code and video demos are available at https://vimalabs.github.io/
arXiv.org Artificial Intelligence
May-28-2023
- Country:
- North America
- United States
- Maryland > Baltimore (0.04)
- Washington > King County
- Seattle (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Austria (0.04)
- France (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Asia
- Macao (0.04)
- China (0.04)
- Middle East > Iran
- Tehran Province > Tehran (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Chūbu > Toyama Prefecture
- Toyama (0.04)
- Kansai > Osaka Prefecture
- North America
- Genre:
- Research Report > New Finding (0.92)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.45)
- Technology: