3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes Antonio Torralba 2 Joshua Tenenbaum 2 Daniel Yamins
–Neural Information Processing Systems
Given a visual scene, humans have strong intuitions about how a scene can evolve over time under given actions. The intuition, often termed visual intuitive physics, is a critical ability that allows us to make effective plans to manipulate the scene to achieve desired outcomes without relying on extensive trial and error. In this paper, we present a framework capable of learning 3D-grounded visual intuitive physics models from videos of complex scenes. Our method is composed of a conditional Neural Radiance Field (NeRF)-style visual frontend and a 3D point-based dynamics prediction backend, using which we can impose strong relational and structural inductive bias to capture the structure of the underlying environment. Unlike existing intuitive point-based dynamics works that rely on the supervision of dense point trajectory from simulators, we relax the requirements and only assume access to multi-view RGB images and (imperfect) instance masks acquired using color prior.
Neural Information Processing Systems
May-28-2025, 12:27:22 GMT
- Country:
- North America > Puerto Rico (0.14)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence