Open-vocabulary Pick and Place via Patch-level Semantic Maps

Jia, Mingxi, Huang, Haojie, Zhang, Zhewen, Wang, Chenghao, Zhao, Linfeng, Wang, Dian, Liu, Jason Xinyu, Walters, Robin, Platt, Robert, Tellex, Stefanie

Jun-21-2024–arXiv.org Artificial Intelligence

Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and struggle with generalization. This paper introduces Grounded Equivariant Manipulation (GEM), a novel approach that leverages the generative capabilities of pre-trained vision-language models and geometric symmetries to facilitate few-shot and zero-shot learning for open-vocabulary robot manipulation tasks. Our experiments demonstrate GEM's high sample efficiency and superior generalization across diverse pick-and-place tasks in both simulation and real-world experiments, showcasing its ability to adapt to novel instructions and unseen objects with minimal data requirements. GEM advances a significant step forward in the domain of language-conditioned robot control, bridging the gap between semantic understanding and action generation in robotic systems.

large language model, natural language, semantic map, (18 more...)

arXiv.org Artificial Intelligence

Jun-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.14)

Genre:
- Research Report > Promising Solution (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.90)
  - Robots > Robots in the Workplace (0.61)