Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning
Zhicheng, Wang, Yagi, Satoshi, Yamamori, Satoshi, Morimoto, Jun
–arXiv.org Artificial Intelligence
Manipulation tasks are a key milestone toward integrating r obots into everyday life. They are particularly challenging because they require direct interac tion with objects, demanding higher levels of precision and robustness [ 1, 2 ]. Mobile manipulation - the integration of navigation and obj ect manipulation - is essential for a domestic service robot, as it enables a single platform to perf orm diverse tasks in unstructured home environments [ 3 ]. While fixed robots excel at repetitive factory operations, they lack the flexibility required to handle household chores. Consequently, the ability to generalize across tasks and environments becomes a critical capability for a truly vers atile domestic assistant. In recent years, learning-based control paradigms have gai ned significant traction for adapting to the unstructured environments encountered in manipulatio n tasks [ 4, 5 ]. Among these approaches, vision-based methods are especially appealing, as they lev erage the ease of obtaining visual data and enable end-to-end mapping from raw images to action outputs . These methods--directly generating robot joint motions from camera images--facilitate the rapi d deployment of domestic robots by eliminating the need for costly, large-scale sensor setups . However, end-to-end mapping has several drawbacks.
arXiv.org Artificial Intelligence
Jul-16-2025
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Natural Language > Large Language Model (0.47)
- Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence