Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering

Jauhri, Snehal, Lunawat, Ishikaa, Chalvatzaki, Georgia

arXiv.org Artificial Intelligence 

Robotic manipulation is crucial in various applications, like industrial automation, assistive robots, etc. A key component for manipulation is effective 6DoF grasping in cluttered environments, as this ability would enhance the efficiency, versatility, and autonomy of robotic systems operating in unstructured environments. Grasping effectively with limited sensory input reduces the need for extensive exploration and multiple viewpoints, enabling efficient and time-saving solutions to robotic applications. Robotic grasping involves generating suitable poses for the robot's end-effector given some sensory information (e.g., visual data). While planar bin picking, i.e., top-down 4DoF grasping (3D position and roll orientation) with two-fingered or suction grippers, has mainly been solved thanks to deep learning models [1-4], 6DoF grasping in the wild, i.e., grasping in the SE(3) space of 3D positions and 3D rotations from any viewpoint remains a challenge [5, 6]. Embodied AI agents, e.g., mobile manipulation robots [7, 8], are expected to perform manipulation tasks similar to humans; humans can leverage geometric information from limited views and mental models to grasp objects without exploring to reconstruct the scene. Such an elaborate plan for grasping in open spaces with clutter would require that robots, given some spatial sensory information, e.g., 3D pointcloud data, can reconstruct the scene, understand the graspable area of different objects, and finally select grasps that are highly likely to succeed, both in terms of lifting an object for a subsequent manipulation task, but crucially, without colliding and potentially damaging the surrounding environment.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found