ViTacGen: Robotic Pushing with Vision-to-Touch Generation

Wu, Zhiyuan, Lin, Yijiong, Zhao, Yongqiang, Zhang, Xuyang, Chen, Zhuo, Lepora, Nathan, Luo, Shan

arXiv.org Artificial Intelligence 

Abstract--Robotic pushing is a fundamental manipulation task that requires tactile feedback to capture subtle contact forces and dynamics between the end-effector and the object. However, real tactile sensors often face hardware limitations such as high costs and fragility, and deployment challenges involving calibration and variations between different sensors, while vision-only policies struggle with satisfactory performance. Inspired by humans' ability to infer tactile states from vision, we propose ViT acGen, a novel robot manipulation framework designed for visual robotic pushing with vision-to-touch generation in reinforcement learning to eliminate the reliance on high-resolution real tactile sensors, enabling effective zero-shot deployment on visual-only robotic systems. Specifically, ViT acGen consists of an encoder-decoder vision-to-touch generation network that generates contact depth images, a standardized tactile representation, directly from visual image sequence, followed by a reinforcement learning policy that fuses visual-tactile data with contrastive learning based on visual and generated tactile observations. Obotic pushing is a fundamental manipulation task that involves applying forces to move objects toward a specified target region [1]. This task requires precise perception of the interactions between the robot and its environment during execution to enable accurate dynamic control [2]. In recent years, data-driven reinforcement learning (RL) approaches relying primarily on visual input have been widely explored for robotic pushing tasks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found