Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning

Liang, Qiwei, Cai, Boyang, Lai, Minghao, Zhuang, Sitong, Lin, Tao, Qin, Yan, Ye, Yixuan, Liang, Jiaming, Xu, Renjing

Dec-5-2025–arXiv.org Artificial Intelligence

Despite strong results on recognition and segmentation, current 3D visual pre-training methods often underperform on robotic manipulation. We attribute this gap to two factors: the lack of state-action-state dynamics modeling and the unnecessary redundancy of explicit geometric reconstruction. We introduce AFRO, a self-supervised framework that learns dynamics-aware 3D representations without action or reconstruction supervision. AFRO casts state prediction as a generative diffusion process and jointly models forward and inverse dynamics in a shared latent space to capture causal transition structure. To prevent feature leakage in action learning, we employ feature differencing and inverse-consistency supervision, improving the quality and stability of visual features. When combined with Diffusion Policy, AFRO substantially increases manipulation success rates across 16 simulated and 4 real-world tasks, outperforming existing pre-training approaches. The framework also scales favorably with data volume and task complexity. Qualitative visualizations indicate that AFRO learns semantically rich, discriminative features, offering an effective pre-training solution for 3D representation learning in robotics. Project page: https://kolakivy.github.io/AFRO/

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

Dec-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.67)

Genre:
- Research Report (0.82)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found