Pre-training Auto-regressive Robotic Models with 4D Representations

Niu, Dantong, Sharma, Yuvan, Xue, Haoru, Biamby, Giscard, Zhang, Junyi, Ji, Ziteng, Darrell, Trevor, Herzig, Roei

arXiv.org Artificial Intelligence 

This could potentially be attributed to the scarcity of large-scale, Foundation models pre-trained on massive unlabeled diverse robotic data, unlike the abundance of text and image datasets have revolutionized natural language data available for vision and language FMs. and computer vision, exhibiting remarkable generalization capabilities, thus highlighting the The lack of robotic data poses a significant bottleneck in importance of pre-training. Yet, efforts in robotics training foundation models that can effectively generalize have struggled to achieve similar success, limited across diverse robotic platforms and tasks. To overcome this by either the need for costly robotic annotations or limitation, several recent approaches (Xiao et al., 2022; Ye the lack of representations that effectively model et al., 2024) employ representation learning by pre-training the physical world. In this paper, we introduce on an abundance of human data, enabling transfer to robotic ARM4R, an Auto-regressive Robotic Model that systems. These approaches aim to recognize the inherent leverages low-level 4D Representations learned similarities between human and robot manipulation tasks from human video data to yield a better pretrained and exploit the vast repositories of human video data available robotic model. Specifically, we focus on on the internet. Yet, these approaches have not been utilizing 3D point tracking representations from able to demonstrate effective generalization to downstream videos derived by lifting 2D representations into tasks. In part, this is due to their representations lacking an 3D space via monocular depth estimation across understanding of the physical world (Zhen et al., 2024a), time. These 4D representations maintain a shared and therefore being less effective for robotics.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found