3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes Antonio Torralba 2 Joshua Tenenbaum 2 Daniel Yamins

May-28-2025, 12:27:22 GMT–Neural Information Processing Systems

Given a visual scene, humans have strong intuitions about how a scene can evolve over time under given actions. The intuition, often termed visual intuitive physics, is a critical ability that allows us to make effective plans to manipulate the scene to achieve desired outcomes without relying on extensive trial and error. In this paper, we present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes. Our method is composed of a conditional Neural Radiance Field (NeRF)-style visual frontend and a 3D point-based dynamics prediction backend, using which we can impose strong relational and structural inductive bias to capture the structure of the underlying environment. Unlike existing intuitive point-based dynamics works that rely on the supervision of dense point trajectory from simulators, we relax the requirements and only assume access to multi-view RGB images and (imperfect) instance masks acquired using color prior.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

May-28-2025, 12:27:22 GMT

Conferences PDF

Add feedback

Country:
- North America > Puerto Rico (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Representation & Reasoning (1.00)
  - Robots (1.00)
  - Vision (1.00)