A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects
Gupta, Arjun, Sathua, Rishik, Gupta, Saurabh
–arXiv.org Artificial Intelligence
Figure 1: Many everyday mobile manipulation tasks require reaching a precise interaction site before executing a motion primitive, e.g. Open loop execution is unable to meet the high-precision needed for these tasks. In this paper, we develop Servoing with Vision Models (SVM), a training-free framework that closes the loop to enable a commodity mobile manipulator to tackle these tasks. Abstract -- Many everyday mobile manipulation tasks require precise interaction with small objects, such as grasping a knob to open a cabinet or pressing a light switch. In this paper, we develop Servoing with Vision Models (SVM), a closed-loop training-free framework that enables a mobile manipulator to tackle such precise tasks involving the manipulation of small objects. SVM employs an RGB-D wrist camera and uses visual servoing for control. Our novelty lies in the use of state-of-the-art vision models to reliably compute 3D targets from the wrist image for diverse tasks and under occlusion due to the end-effector . T o mitigate occlusion artifacts, we employ vision models to out-paint the end-effector thereby significantly enhancing target localization. We demonstrate that aided by out-painting methods, open-vocabulary object detectors can serve as a drop-in module to identify semantic targets ( e.g.
arXiv.org Artificial Intelligence
Feb-19-2025