Object-centric 3DMotion Field for Robot Learning from Human Videos

Jun-23-2026, 07:25:04 GMT–Neural Information Processing Systems

Learning robot control policies from human videos is a promising direction for scaling up robot learning. However, how to extract action knowledge (or action representations) from videos for policy learning remains a key challenge. Existing action representations such as video frames, pixelflow, and pointcloud flow have inherent limitations such as modeling complexity or loss of information. In this paper, we propose to use object-centric 3D motion field to represent actions for robot learning from human videos, and present a novel framework for extracting this representation from videos for zero-shot control. We introduce two novel components in its implementation.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-23-2026, 07:25:04 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Information Technology (0.46)
- Media > Photography (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.67)
  - Robots > Manipulation (0.46)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found