DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction

Pan, Sicong, Jin, Liren, Huang, Xuying, Stachniss, Cyrill, Popović, Marija, Bennewitz, Maren

arXiv.org Artificial Intelligence 

Many autonomous robotic applications depend on accurate 3D models of objects to perform downstream tasks. These include object manipulation in household scenarios (Breyer et al. 2022; Dengler et al. 2023; Jauhri et al. 2024), harvesting and prediction of intervention actions in agriculture (Pan et al. 2023; Lenz et al. 2024; Y ao et al. 2024), as well as solving jigsaw puzzles of fragmented frescoes in archaeology (Tsesmelis et al. 2024). For these applications, high-fidelity 3D object representations are critical to enable precise action execution and informed decision-making. When deployed in initially unknown environments, robots are often required to autonomously reconstruct 3D models of objects to understand their geometries, textures, positions, and orientations before taking action. Generating these models typically involves capturing data from multiple viewpoints using onboard sensors such as RGB or depth cameras. Data acquisition solely following predefined or randomly chosen sensor viewpoints is inefficient, as these approaches fail to adapt to the geometry and spatial distribution of the object to be reconstructed. This can lead to inferior reconstruction results, especially when objects are complex and contain self-occlusions. To address this, we propose using active reconstruction strategies, where object-specific sensor viewpoints are planned for data acquisition to achieve high-quality 3D object reconstruction. The key aspect of active reconstruction is view planning for generating viewpoints (Zeng et al. 2020a) that enables the robot to acquire the most informative sensor measurements.