FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
Tang, Chao, Xiao, Anxing, Deng, Yuhong, Hu, Tianrun, Dong, Wenlong, Zhang, Hanbo, Hsu, David, Zhang, Hong
–arXiv.org Artificial Intelligence
Abstract--Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in establishing functional correspondences between demonstration and test tools, considering significant geometric variations among tools with the same function (i.e., intra-function variations). To address this challenge, we propose FUNCTO (Function-Centric OSIL for Tool Manipulation), an OSIL method that establishes function-centric correspondences with a 3D functional keypoint representation, enabling robots to generalize tool manipulation skills from a single human demonstration video to novel tools with the same function despite significant intra-function variations. We evaluate FUNCTO against exiting modular OSIL methods and end-to-end behavioral cloning methods through real-robot experiments on diverse tool manipulation tasks. The results demonstrate the superiority of FUNCTO when generalizing to novel tools with intra-function geometric variations. More details are available at https://sites.google.com/view/functo. The ability to use tools has long been recognized as a hallmark of human intelligence [1]. Endowing robots with the same capability holds the promise of unlocking a wide range of downstream tasks and applications [2, 3, 4]. As a step towards this goal, we tackle the problem of one-shot imitation learning (OSIL) for tool manipulation, which involves teaching robots a tool manipulation skill with a single human demonstration video. Previous OSIL methods [4, 5, 6, 7, 8, 9, 10] above, it remains a non-trivial challenge for robots due assume that tools supporting the same function share highly to significant geometric variations (e.g., shape, size, topology) similar shapes or appearances.
arXiv.org Artificial Intelligence
Feb-17-2025