Visual-Geometry Diffusion Policy: Robust Generalization via Complementarity-Aware Multimodal Fusion
Tang, Yikai, Geng, Haoran, Zang, Sheng, Abbeel, Pieter, Malik, Jitendra
–arXiv.org Artificial Intelligence
Visual-Geometry Diffusion Policy (VGDP) is an imitation learning method that fuses 3D observations with 2D images through a Complementarity-Aware Fusion Module, which uses modality-wise dropout to enforce balanced use of RGB and geometry. This design yields substantial improvements in average performance, generalization, and robustness. VGDP is extensively evaluated in both simulation and the real world, covering a wide range of tasks and both visual and spatial randomizations. Abstract-- Imitation learning has emerged as a crucial approach for acquiring visuomotor skills from demonstrations, where designing effective observation encoders is essential for policy generalization. However, existing methods often struggle to generalize under spatial and visual randomizations, instead tending to overfit. T o address this challenge, we propose Visual-Geometry Diffusion Policy (VGDP), a multimodal imitation learning framework built around a Complementarity-Aware Fusion Module where modality-wise dropout enforces balanced use of RGB and point-cloud cues, with cross-attention serving as a lightweight interaction layer . Our experiments show that the expressiveness of the fused latent space is largely induced by the enforced complementarity from modality-wise dropout, with cross-attention serving primarily as a lightweight interaction mechanism rather than the main source of robustness. Across a benchmark of 18 simulated tasks and 4 real-world tasks, VGDP outperforms seven baseline policies with an average performance improvement of 39.1%.
arXiv.org Artificial Intelligence
Dec-1-2025
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report (0.46)
- Industry:
- Education (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence