3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

Yuan, Weihao, Gu, Xiaodong, Li, Heng, Dong, Zilong, Zhu, Siyu

arXiv.org Artificial Intelligence 

Monocular 3D reconstruction is a classical task in computer vision and is essential for numerous applications like autonomous navigation, robotics, and augmented/virtual reality. Such a vision task aims to reconstruct an accurate and complete dense 3D shape of an unstructured scene from only a sequence of monocular RGB images. While the camera poses can be estimated accurately with the state-of-the-art SLAM (Campos et al., 2021) or SfM systems (Schonberger & Frahm, 2016), a dense 3D scene reconstruction from these posed images is still a challenging problem due to the complex geometry of a large-scale environment, such as the various objects, flexible lighting, reflective surfaces, and diverse cameras of different focus, distortion, and sensor noise. Many previous methods reconstruct the scenario in a multi-view depth manner (Yao et al., 2018; Chen et al., 2019; Duzceker et al., 2021). They predict the dense depth map of each target frame, which can estimate accurate local geometry but need additional efforts in fusing these depth maps (Murez et al., 2020; Sun et al., 2021), e.g., solving the inconsistencies between different views. Recently, some methods have tried to directly regress the complete 3D surface of the entire scene (Murez et al., 2020; Sun et al., 2021) from a truncated signed distance function (TSDF) representation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found