OVAL-Grasp: Open-Vocabulary Affordance Localization for Task Oriented Grasping
Tong, Edmond, Balaji, Advaith, Opipari, Anthony, Lewis, Stanley, Zeng, Zhen, Jenkins, Odest Chadwicke
–arXiv.org Artificial Intelligence
To manipulate objects in novel, unstructured environments, robots need task-oriented grasps that target object parts based on the given task. Geometry-based methods often struggle with visually defined parts, occlusions, and unseen objects. We introduce OVAL-Grasp, a zero-shot open-vocabulary approach to task-oriented, affordance based grasping that uses large-language models (LLM) and vision-language models (VLM) to allow a robot to grasp objects at the correct part according to a given task. Given an RGB image and a task, OVAL-Grasp identifies parts to grasp or avoid with an LLM, segments them with a VLM, and generates a 2D heatmap of actionable regions on the object. During our evaluations, we found that our method outperformed two task oriented grasping baselines on experiments with 20 household objects with 3 unique tasks for each. OVAL-Grasp successfully identifies and segments the correct object part 95% of the time and grasps the correct actionable area 78.3% of the time in real-world experiments with the Fetch mobile manipulator. Additionally, OVAL-Grasp finds correct object parts under partial occlusions, demonstrating a part selection success rate of 80% in cluttered scenes. We also demonstrate OVAL-Grasp's efficacy in scenarios that rely on visual features for part selection, and show the benefit of a modular design through our ablation experiments. Our project webpage is available at https://ekjt.github.io/OVAL-Grasp/.
arXiv.org Artificial Intelligence
Nov-27-2025
- Country:
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Genre:
- Research Report (0.65)
- Technology: