Object-Centric Latent Action Learning
Klepach, Albina, Nikulin, Alexander, Zisman, Ilya, Tarasov, Denis, Derevyagin, Alexander, Polubarov, Andrei, Lyubaykin, Nikita, Kurenkov, Vladislav
–arXiv.org Artificial Intelligence
Leveraging vast amounts of internet video data for Embodied AI is currently bottle-necked by the lack of action annotations and the presence of actioncorrelated distractors. We propose a novel object-centric latent action learning approach, based on VideoSaur and LAPO, that employs self-supervised decomposition of scenes into object representations and annotates video data with proxyaction labels. This method effectively disentangles causal agent-object interactions from irrelevant background noise and reduces the performance degradation of latent action learning approaches caused by distractors. Our preliminary experiments with the Distracting Control Suite show that latent action pretraining based on object decompositions improve the quality of inferred latent actions by x2.7 and efficiency of downstream fine-tuning with a small set of labeled actions, increasing return by x2.6 on average. In recent years, the scaling of model and data sizes has led to the creation of powerful and general foundation models (Bommasani et al., 2021) that have enabled many breakthroughs in understanding and generation of natural language (Achiam et al., 2023; Brown et al., 2020) and images (Dehghani et al., 2023; Radford et al., 2021). On the other hand, the field of embodied AI has generally remained behind in terms of generalization and emergent abilities (Guruprasad et al., 2024), being mostly limited by the lack of diverse data for pre-training (Lin et al., 2024). The vast amount of video data on the Internet, covering a wide variety of human-related activities, can potentially fulfill the current data needs (McCarthy et al., 2024).
arXiv.org Artificial Intelligence
Feb-13-2025
- Country:
- South America > Suriname
- Marowijne District > Albina (0.04)
- Europe > Switzerland
- South America > Suriname
- Genre:
- Research Report (0.84)
- Technology: