GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping
Ma, Teli, Wang, Zifan, Zhou, Jiaming, Wang, Mengmeng, Liang, Junwei
–arXiv.org Artificial Intelligence
Inferring affordable (i.e., graspable) parts of arbitrary objects based on human specifications is essential for robots advancing toward open-vocabulary manipulation. Current grasp planners, however, are hindered by limited vision-language comprehension and time-consuming 3D radiance modeling, restricting real-time, open-vocabulary interactions with objects. To address these limitations, we propose GLOVER, a unified Generalizable Open-Vocabulary Affordance Reasoning framework, which fine-tunes the Large Language Models (LLMs) to predict visual affordance of graspable object parts within RGB feature space. We compile a dataset of over 10,000 images from human-object interactions, annotated with unified visual and linguistic affordance labels, to enable multi-modal fine-tuning. GLOVER inherits world knowledge and common-sense reasoning from LLMs, facilitating more fine-grained object understanding and sophisticated tool-use reasoning. To enable effective real-world deployment, we present Affordance-Aware Grasping Estimation (AGE), a non-parametric grasp planner that aligns the gripper pose with a superquadric surface derived from affordance data. In evaluations across 30 real-world scenes, GLOVER achieves success rates of 86.0% in part identification and 76.3% in grasping, with speeds approximately 330 times faster in affordance reasoning and 40 times faster in grasping pose estimation than the previous state-of-the-art.
arXiv.org Artificial Intelligence
Nov-19-2024
- Country:
- Europe > Switzerland (0.28)
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Statistical Learning (1.00)
- Natural Language > Large Language Model (0.90)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence