Object-centric 3DMotion Field for Robot Learning from Human Videos
–Neural Information Processing Systems
Learning robot control policies from human videos is a promising direction for scaling up robot learning. However, how to extract action knowledge (or action representations) from videos for policy learning remains a key challenge. Existing action representations such as video frames, pixelflow, and pointcloud flow have inherent limitations such as modeling complexity or loss of information. In this paper, we propose to use object-centric 3D motion field to represent actions for robot learning from human videos, and present a novel framework for extracting this representation from videos for zero-shot control. We introduce two novel components in its implementation.
Neural Information Processing Systems
Jun-23-2026, 07:25:04 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.46)
- Media > Photography (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language > Large Language Model (0.67)
- Robots > Manipulation (0.46)
- Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence