Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
Hu, Jianshu, Wang, Lidi, Li, Shujia, Jiang, Yunpeng, Li, Xiao, Weng, Paul, Ban, Yutong
–arXiv.org Artificial Intelligence
Hierarchical coarse-to-fine policy, where a coarse branch predicts a region of interest to guide a fine-grained action predictor, has demonstrated significant potential in robotic 3D manipulation tasks by especially enhancing sample efficiency and enabling more precise manipulation. However, even augmented with pre-trained models, these hierarchical policies still suffer from generalization issues. To enhance generalization to novel instructions and environment variations, we propose Coarse-to-fine Language-Aligned manipulation Policy (CLAP), a framework that integrates three key components: 1) task decomposition, 2) VLM fine-tuning for 3D keypoint prediction, and 3) 3D-aware representation. Through comprehensive experiments in simulation and on a real robot, we demonstrate its superior generalization capability. Specifically, on GemBench, a benchmark designed for evaluating generalization, our approach achieves a 12% higher average success rate than the SOT A method while using only 1/5 of the training trajectories. In real-world experiments, our policy, trained on only 10 demonstrations, successfully generalizes to novel instructions and environments. Robot learning, especially via imitation learning, has demonstrated promising success in enabling robots to solve complex 3D manipulation tasks (Intelligence et al., 2025; Liu et al., 2024). However, scaling these methods to a broader range of real-world applications (e.g., industrial, service, or home robotics) requires enhancing both (G1) their generalization to environment variations, and (G2) their skill compositional generalization.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Africa > Angola
- Namibe Province > South Atlantic Ocean (0.04)
- Asia > China
- Europe > Netherlands
- South Holland > Delft (0.04)
- North America
- Mexico > Gulf of Mexico (0.14)
- Montserrat (0.04)
- South America > Peru
- Loreto Department (0.04)
- Africa > Angola
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: