Bridging VLM and KMP: Enabling Fine-grained robotic manipulation via Semantic Keypoints Representation

Zhu, Junjie, Liu, Huayu, Wang, Jin, Wen, Bangrong, Huang, Kaixiang, Li, Xiaofei, Zhan, Haiyun, Lu, Guodong

Mar-4-2025–arXiv.org Artificial Intelligence

From early Movement Primitive (MP) techniques to modern Vision-Language Models (VLMs), autonomous manipulation has remained a pivotal topic in robotics. As two extremes, VLM-based methods emphasize zero-shot and adaptive manipulation but struggle with fine-grained planning. In contrast, MP-based approaches excel in precise trajectory generalization but lack decision-making ability. To leverage the strengths of the two frameworks, we propose VL-MP, which integrates VLM with Kernelized Movement Primitives (KMP) via a low-distortion decision information transfer bridge, enabling fine-grained robotic manipulation under ambiguous situations. One key of VL-MP is the accurate representation of task decision parameters through semantic keypoints constraints, leading to more precise task parameter generation. Additionally, we introduce a local trajectory feature-enhanced KMP to support VL-MP, thereby achieving shape preservation for complex trajectories. Extensive experiments conducted in complex real-world environments validate the effectiveness of VL-MP for adaptive and fine-grained manipulation.

generalization, manipulation, trajectory, (17 more...)

arXiv.org Artificial Intelligence

Mar-4-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Zhejiang Province
  - Ningbo (0.05)
  - Hangzhou (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found