Building 3D Representations and Generating Motions From a Single Image via Video-Generation

Jun-13-2026, 23:37:45 GMT–Neural Information Processing Systems

Autonomous robots typically need to construct representations of their surroundings and adapt their motions to the geometry of their environment. Here, we tackle the problem of constructing a policy model for collision-free motion generation, consistent with the environment, from a single input RGB image. Extracting 3D structures from a single image often involves monocular depth estimation. Developments in depth estimation have given rise to large pre-trained models such as \emph{DepthAnything}. However, using outputs of these models for downstream motion generation is challenging due to frustum-shaped errors that arise.

artificial intelligence, name change, proceedings, (9 more...)

Neural Information Processing Systems

Jun-13-2026, 23:37:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Robots (0.96)