Cameras as Rays: Pose Estimation via Ray Diffusion

Zhang, Jason Y., Lin, Amy, Kumar, Moneish, Yang, Tzu-Hsuan, Ramanan, Deva, Tulsiani, Shubham

Apr-4-2024–arXiv.org Artificial Intelligence

Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression-and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures. Top: Given sparsely sampled images, our approach learns to denoise camera rays (represented using Plücker coordinates). We then recover camera intrinsics and extrinsics from the positions of the rays.

artificial intelligence, machine learning, ray diffusion, (16 more...)

arXiv.org Artificial Intelligence

Apr-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.46)
    - Vision > Video Understanding (0.61)
  - Sensing and Signal Processing > Image Processing (1.00)