Sim2Real Transfer for Vision-Based Grasp Verification

Amargant, Pau, Hönig, Peter, Vincze, Markus

arXiv.org Artificial Intelligence 

-- The verification of successful grasps is a crucial aspect of robot manipulation, particularly when handling de-formable objects. In this work, we present a vision-based approach for grasp verification to determine whether the robotic gripper has successfully grasped an object. Our method employs a two-stage architecture; first a YOLO-based object detection model to detect and locate the robot's gripper and then a ResNet-based classifier determines the presence of an object. T o address the limitations of real-world data capture, we introduce HSR-GraspSynth, a synthetic dataset designed to simulate diverse grasping scenarios. Furthermore, we explore the use of Visual Question Answering capabilities as a zero-shot baseline to which we compare our model. Experimental results demonstrate that our approach achieves high accuracy in real-world environments, with potential for integration into grasping pipelines. Index T erms -- Grasp verification, Robot manipulation, De-formable objects, Vision-based grasping, YOLO object detection, ResNet classification, Synthetic dataset, Visual Question Answering. I. INTRODUCTION Deformable object manipulation is a growing field of research in robotics due to its relevance in a wide range of tasks [26].