Light3R-SfM: Towards Feed-forward Structure-from-Motion

Elflein, Sven, Zhou, Qunjie, Agostinho, Sérgio, Leal-Taixé, Laura

arXiv.org Artificial Intelligence 

To perform Structure-from-Motion (SfM) is the task of jointly recovering SfM from an image collection, DUSt3R works [22, camera poses and reconstructing the 3D scene 51] first compute stereo reconstruction exhaustively for all structure from a set of unconstrained images. This longstanding image pairs and then obtain globally aligned pointmaps problem is essential to many computer vision applications, for all cameras through joint optimization of pairwise rigid including novel view synthesis via NeRFs [3, transformations and local pointmaps. This baseline has been 29] and 3DGS [20], multi-view stereo (MVS) reconstruction significantly improved by the concurrent work MASt3R- [31, 49], and visual localization [34, 36]. Traditional SfM [12] that leverages image retrieval to drastically reduce SfM methods generally follow two main approaches: incremental the computation overhead, boosts optimization efficiency [37, 41, 56] and global [8, 30, 55] SfM. Both by optimizing only over the sparse pixel correspondences, paradigms rely on key components such as feature detection and appends a global bundle adjustment stage for and matching for correspondence search, 3D triangulation accuracy refinement. While optimization-based alignment to reconstruct geometry from 2D correspondences, has been proven to be the key to accurate 3D reconstruction and joint optimization of camera poses and scene geometry by DUSt3R, MASt3R-SfM and classical SfM methods through bundle adjustment. A major research direction has [25, 30, 37], this comes at the cost of slow runtime and been to replace these components with learning-based modules, extensive memory footprint even for moderately-sized image progressing towards fully end-to-end SfM [7, 40, 50].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found