Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics
Bianchini, Bibit, Zhu, Minghan, Sun, Mengti, Jiang, Bowen, Taylor, Camillo J., Posa, Michael
–arXiv.org Artificial Intelligence
Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, Michael Posa The first two authors contributed equally to this work. Abstract --We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. I NTRODUCTION In-the-wild manipulation will require robots to encounter a vast array of different objects. While some might be recognized from an existing database, others will require physical interaction to be newly understood on the spot. Dexterous manipulation of these objects will benefit from the ability to rapidly learn or identify object properties: geometry is most critical, but inertial properties are also valuable for predicting motion, particularly under forceful manipulation. Use of such models boasts the benefits of interpretability and expected generalizability, at the cost of requiring the model. This paper presents Vysics, which builds dynamics models of novel objects from vision and physical interaction, even in the face of substantial visual occlusions (Figure 1). Rapid modeling requires combining all available information in a unified fashion.
arXiv.org Artificial Intelligence
Apr-29-2025
- Genre:
- Research Report (0.64)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.34)
- Technology: