Self-Supervised Surround-View Depth Estimation with Volumetric Feature Fusion
–Neural Information Processing Systems
In this supplementary material, we provide details on evaluation metrics, details on our network architecture, a trade-off between computational cost and depth accuracy, additional qualitative results, depth accuracy on overlap regions, point cloud results on the DDAD dataset and nuScenes dataset, and the license of existing assets we used for our paper. To evaluate the depth accuracy, we use the error metric proposed by Eigen et al. [8]. We provide further details on our network architecture with Table 3. For more information about the implementation, please refer to our source code. Our model uses only 1D/2D convolutions and MLPs; we do not use 3D convolution which is computationally heavy and consume extensive memory. We used pre-trained ResNet-18 [16] for the image encoder.
Neural Information Processing Systems
May-28-2025, 17:53:35 GMT