Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion

Hu, Kaizhe, Rui, Zihang, He, Yao, Liu, Yuyao, Hua, Pu, Xu, Huazhe

Nov-13-2024–arXiv.org Artificial Intelligence

Figure 1: Left: The tree of Stem-OB inversion is composed of different objects progressively inverted through a diffusion inversion process. Moving downward alone the tree's branches, objects of different textures, appearances, and categories gradually get closer, eventually converging into the same root of Gaussian noise, where they are completely indistinguishable. Visual imitation learning methods demonstrate strong performance, yet they lack generalization when faced with visual input perturbations like variations in lighting and textures. This limitation hampers their practical application in real-world settings. To address this, we propose Stem-OB that leverages the inversion process of pretrained image diffusion models to suppress low-level visual differences while maintaining high-level scene structures. This image inversion process is akin to transforming the observation into a shared representation, from which other observations also stem. Stem-OB offers a simple yet effective plug-and-play solution that stands in contrast to data augmentation approaches. It demonstrates robustness to various unspecified appearance changes without the need for additional training. We provide theoretical insights and empirical results that validate the efficacy of our approach in simulated and real settings. Stem-OB shows an exceptionally significant improvement in real-world robotic tasks, where challenging light and appearance changes are present, with an average increase of 22.2% in success rates compared to the best baseline. See our website for more info. Despite the versatility demonstrated by visual IL, learned policies are often brittle and fail to generalize to unseen environments, even minor perturbations such as altering lighting conditions or changing the texture of the object may lead to failure of the learned policy (Xie et al., 2023; Yuan et al., 2024b).

artificial intelligence, inversion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Nov-13-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Robots (1.00)