ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Qian, Yaoyao, Zhu, Xupeng, Biza, Ondrej, Jiang, Shuo, Zhao, Linfeng, Huang, Haojie, Qi, Yu, Platt, Robert

Jul-15-2024–arXiv.org Artificial Intelligence

The field of robotic grasping has seen significant advancements in recent years, with deep learning and vision-language models driving progress towards more intelligent and adaptable grasping systems [1, 2, 3]. However, robotic grasping in highly cluttered environments remains a major challenge, as target objects are often severely occluded or completely hidden [4, 5, 6]. Even stateof-the-art methods struggle to accurately identify and grasp objects in such scenarios. To address this challenge, we propose ThinkGrasp, which combines the strength of large-scale pretrained vision-language models with an occlusion handling system. ThinkGrasp leverages the advanced reasoning capabilities of models like GPT-4o [7] to gain a visual understanding of environmental and object properties such as sharpness and material composition. By integrating this knowledge through a structured prompt-based chain of thought, ThinkGrasp can significantly enhance success rates and ensure the safety of grasp poses by strategically eliminating obstructing objects. For instance, it prioritizes larger and centrally located objects to maximize visibility and access and focuses on grasping the safest and most advantageous parts, such as handles or flat surfaces. Unlike VL-Grasp[8], which relies on the RoboRefIt dataset for robotic perception and reasoning, ThinkGrasp benefits from GPT-4o's reasoning and generalization capabilities. This allows ThinkGrasp to intuitively select the right objects and achieve higher performance in complex environments, as demonstrated by our comparative experiments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-15-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands (0.14)
- North America > United States (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (0.93)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found