FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow

Smith, Cameron, Du, Yilun, Tewari, Ayush, Sitzmann, Vincent

May-31-2023–arXiv.org Artificial Intelligence

Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. The key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion, which is prohibitively expensive to run at scale. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass. We estimate poses by first lifting frame-to-frame optical flow to 3D scene flow via differentiable rendering, preserving locality and shift-equivariance of the image processing backbone. SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field. This formulation enables us to jointly supervise pose estimation and a generalizable neural scene representation via re-rendering the input video, and thus, train end-to-end and fully self-supervised on real-world video datasets. We demonstrate that our method performs robustly on diverse, real-world video, notably on sequences traditionally challenging to optimization-based pose estimation techniques.

artificial intelligence, camera pose, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-31-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Japan
  - Honshū > Chūbu (0.14)
- North America > United States (0.14)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.47)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found