depth map
SurDis: ASurface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments
According to World Health Organization, there is an estimated 2.2 billion people with a near or distance vision impairment worldwide. Difficulty in self-navigation is one of the greatest challenges to independence for the blind and low vision (BLV) people. Through consultations with several BLV service providers, we realized that negotiating surface discontinuities is one of the very prominent challenges when navigating an outdoor environment within the urban. Surface discontinuities are commonly formed by rises and drop-offs along a pathway. They could be a threat to balancing during a walk and perceiving such a threat is highly challenging to the BLVs.
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
We propose a method for metric-scale monocular depth estimation. Inferring depth from a single image is an ill-posed problem due to the loss of scale from perspective projection during the image formation process. Any scale chosen is a bias, typically stemming from training on a dataset; hence, existing works have instead opted to use relative (normalized, inverse) depth. Our goal is to recover metric-scaled depth maps through a linear transformation. The crux of our method lies in the observation that certain objects (e.g., cars, trees, street signs) are typically found or associated with certain types of scenes (e.g., outdoor).
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images. In these methods, an input view is geometrically warped to novel views with estimated depth maps, then the warped image is inpainted by T2I models.
DäRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation - Supplementary Materials - A Implementation Details A.1 Architecture
It represents a radiance field using tri-planes with three multi-resolutions for each plane: 128, 256, and 512 in both height and width, and 32 in feature depth. However, any MDE model can be utilized within our framework [19, 13, 12]. The training process takes approximately 3 hours. In other words, we can rewrite the above scheme as a closed problem. The results of DDP-NeRF with in-domain priors are 20.96,