Building 3D Representations and Generating Motions From a Single Image via Video-Generation
–Neural Information Processing Systems
Autonomous robots typically need to construct representations of their surroundings and adapt their motions to the geometry of their environment. Here, we tackle the problem of constructing a policy model for collision-free motion generation, consistent with the environment, from a single input RGB image. Extracting 3D structures from a single image often involves monocular depth estimation. Developments in depth estimation have given rise to large pre-trained models such as \emph{DepthAnything}. However, using outputs of these models for downstream motion generation is challenging due to frustum-shaped errors that arise.
Neural Information Processing Systems
Jun-13-2026, 23:37:45 GMT
- Technology:
- Information Technology > Artificial Intelligence > Robots (0.96)