Video Depth Estimation ModelCover FigureMerge360!imageto video

Neural Information Processing Systems 

To mitigate the distortions brought by equirectangular projection, existing methods typically divide 360 images into distortion-less perspective patches. However, since these patches are processed independently, depth inconsistencies are often introduced due to scale drift among patches. Recently, video depth estimation (VDE) models have leveraged temporal consistency for stable depth predictions across frames. Inspired by this, we propose to represent a 360 image as a sequence of perspective frames, mimicking the viewpoint adjustments users make when exploring a 360 scenario in virtual reality. Thus, the spatial consistency among perspective depth patches can be enhanced by exploiting the temporal consistency inherent in VDE models. To this end, we introduce a training-free pipeline for 360 monocular depth estimation, called ST2360D.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found