General Flow as Foundation Affordance for Scalable Robot Learning

Yuan, Chengbo, Wen, Chuan, Zhang, Tong, Gao, Yang

arXiv.org Artificial Intelligence 

Figure 1: We propose General Flow as Foundation Affordance. Its properties and applications are analyzed to reveal its great power. We design a scale-aware algorithm for general flow prediction and achieve stable zero-shot cross-embodiment skill transfer in the real world. Abstract--We address the challenge of acquiring real-world guidance, thus facilitating stable zero-shot skill transfer in realworld manipulation skills with a scalable framework. We deploy our method with a policy based on the success of large-scale auto-regressive prediction in Large closed-loop flow prediction. Remarkably, without any additional Language Models (LLMs), we hold the belief that identifying training, our method achieves an impressive 81% success rate an appropriate prediction target capable of leveraging largescale in human-to-robot skill transfer, covering 18 tasks in 6 scenes. Therefore, we propose to utilize flow, which represents leveraging cross-embodiment data resources; (2) universality: the future trajectories of 3D points on objects of interest, as an multiple object categories, including rigid, articulated, and soft ideal prediction target in robot learning. To exploit scalable data bodies; (3) stable skill transfer: providing actionable guidance resources, we turn our attention to cross-embodiment datasets. These lead to a new pathway We develop, for the first time, a language-conditioned prediction towards scalable general robot learning. We first develop pipelines to extract 3D flow labels We aim to reveal a potential pathway for replicating the directly from RGBD human video datasets. We find prediction success of Large Language Models (LLMs) in the domain of of dense flow in real-world scene point clouds remains robot learning. Specifically, we are interested in developing a formidable challenge, primarily due to the variability of a new framework that enables scalable learning for robot trajectory scales and the need to enhance robustness in zeroshot manipulation. To address these issues, we employ scale-aware the future, this framework has the potential to progressively strategies in both the data and model aspects, complemented enhance the capabilities of robots, i.e., the scaling law that has by augmentation techniques that focus on embodiment occlusion been observed in LLMs [82]. Inspired by the LLMs training (human hand and robot arm) and query point sampling paradigm [14], we believe that two key elements contribute to (3D points on objects of interest), thereby boosting zero-shot their strong generalization abilities: (1) a vast training dataset stability.