A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects

Gupta, Arjun, Sathua, Rishik, Gupta, Saurabh

Feb-19-2025–arXiv.org Artificial Intelligence

Figure 1: Many everyday mobile manipulation tasks require reaching a precise interaction site before executing a motion primitive, e.g. Open loop execution is unable to meet the high-precision needed for these tasks. In this paper, we develop Servoing with Vision Models (SVM), a training-free framework that closes the loop to enable a commodity mobile manipulator to tackle these tasks. Abstract -- Many everyday mobile manipulation tasks require precise interaction with small objects, such as grasping a knob to open a cabinet or pressing a light switch. In this paper, we develop Servoing with Vision Models (SVM), a closed-loop training-free framework that enables a mobile manipulator to tackle such precise tasks involving the manipulation of small objects. SVM employs an RGB-D wrist camera and uses visual servoing for control. Our novelty lies in the use of state-of-the-art vision models to reliably compute 3D targets from the wrist image for diverse tasks and under occlusion due to the end-effector . T o mitigate occlusion artifacts, we employ vision models to out-paint the end-effector thereby significantly enhancing target localization. We demonstrate that aided by out-painting methods, open-vocabulary object detectors can serve as a drop-in module to identify semantic targets ( e.g.

artificial intelligence, conference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Feb-19-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Energy (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.68)
  - Vision (1.00)