GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
Lee, Sungjae, Hong, Yeonjoo, Kim, Kwang In
–arXiv.org Artificial Intelligence
Despite significant advancements in robotic manipulation, achieving consistent and stable grasping remains a fundamental challenge, often limiting the successful execution of complex tasks. Our analysis reveals that even state-of-the-art policy models frequently exhibit unstable grasping behaviors, leading to failure cases that create bottlenecks in real-world robotic applications. To address these challenges, we introduce GraspCorrect, a plug-and-play module designed to enhance grasp performance through vision-language model-guided feedback. GraspCorrect employs an iterative visual question-answering framework with two key components: grasp-guided prompting, which incorporates task-specific constraints, and object-aware sampling, which ensures the selection of physically feasible grasp candidates. By iteratively generating intermediate visual goals and translating them into joint-level actions, GraspCorrect significantly improves grasp stability and consistently enhances task success rates across existing policy models in the RLBench and CALVIN datasets.
arXiv.org Artificial Intelligence
Mar-19-2025
- Country:
- North America > United States
- Iowa (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Natural Language > Large Language Model (0.68)
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Information Technology > Artificial Intelligence