kitti depth completion validation
Temporal Lidar Depth Completion
Kaskela, Pietari, Fischer, Philipp, Roman, Timo
The task of depth completion aims at recovering a dense depth map from a sparse depth map using additional inputs such as camera images as guidance (cf. Figure 2). The task is especially important in the context of autonomous vehicles (AVs), where sparse depth maps are produced by lidar sensors but dense depth maps are required by some employed perception algorithms. For example, the Velodyne HDL-64E lidar sensor used by the popular KITTI [6] dataset fills up only 6% of the depth values of a corresponding color image, when projected onto the image. In addition to infilling and interpolating the depth values of the remaining 94% pixels, a proper depth completion solution needs to be able to deal with errors caused by the different mounting positions of the camera and lidar sensor, moving objects and the spinning movement of the lidar sensor itself. Figure 2 illustrates the inputs (color image, sparse depth) and the output (dense depth) of the depth completion task. Notice how there are occlusions (image regions with missing points) and overlaps (image regions with points from different depths) in the image, since the lidar and the camera have slightly different viewpoints. Most state-of-the-art depth completion approaches rely on a U-Net [21] style backbone followed by a CSPN-based [2] refinement network [8, 14, 19].