Choi, Minwoo
Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces
Choi, Wonhyeok, Hwang, Kyumin, Choi, Minwoo, Han, Kiljoon, Choi, Wonjoon, Shin, Mingyu, Im, Sunghoon
Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning as it estimates depth without requiring ground truth depth maps. This approach typically uses a photometric consistency loss between a synthesized image, generated from the estimated depth, and the original image, thereby reducing the need for extensive dataset acquisition. However, the conventional photometric consistency loss relies on the Lambertian assumption, which often leads to significant errors when dealing with reflective surfaces that deviate from this model. To address this limitation, we propose a novel framework that incorporates intrinsic image decomposition into SSMDE. Our method synergistically trains for both monocular depth estimation and intrinsic image decomposition. The accurate depth estimation facilitates multi-image consistency for intrinsic image decomposition by aligning different view coordinate systems, while the decomposition process identifies reflective areas and excludes corrupted gradients from the depth training process. Furthermore, our framework introduces a pseudo-depth generation and knowledge distillation technique to further enhance the performance of the student model across both reflective and non-reflective surfaces. Comprehensive evaluations on multiple datasets show that our approach significantly outperforms existing SSMDE baselines in depth prediction, especially on reflective surfaces.
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
Choi, Wonhyeok, Hwang, Kyumin, Peng, Wei, Choi, Minwoo, Im, Sunghoon
Published as a conference paper at ICLR 2025S ELF-SUPERVISED M ONOCULAR D EPTH E STIMATION R OBUST TO R EFLECTIVE S URFACE L EVERAGED BY T RIPLET M INING Wonhyeok Choi 1,, Kyumin Hwang 1,, Wei Peng 2, Minwoo Choi 1, Sunghoon Im 1, Electrical Engineering and Computer Science 1, Psychiatry and Behavioral Sciences 2 Daegu Gyeongbuk Institute of Science and Technology 1, Stanford University 2 South Korea 1, USA 2 {smu06117,kyumin,subminu,sunghoonim} @dgist.ac.kr 1, wepeng@stanford.edu 2 A BSTRACT Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image, by learning depth from RGB image sequences, eliminating the need for ground-truth depth labels. Although this approach simplifies data acquisition compared to supervised methods, it struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance, leading to inaccurate training on such surfaces. To tackle this problem, we propose a novel training strategy for an SSMDE by leveraging triplet mining to pinpoint reflective regions at the pixel level, guided by the camera geometry between different viewpoints. The proposed reflection-aware triplet mining loss specifically penalizes the inappropriate photometric error minimization on the localized reflective regions while preserving depth accuracy in non-reflective areas. We also incorporate a reflection-aware knowledge distillation method that enables a student model to selectively learn the pixel-level knowledge from reflective and non-reflective regions. Evaluation results on multiple datasets demonstrate that our method effectively enhances depth quality on reflective surfaces and outperforms state-of-the-art SSMDE baselines. This approach significantly simplifies data acquisition compared to traditional supervised methods (Fu et al., 2018; Lee et al., 2019; Bhat et al., 2021), which often involve high costs for annotation. As such, many SSMDE studies (Godard et al., 2019; Zhou et al., 2017; Garg et al., 2016; Guizilini et al., 2020) have explored its viability as a mainstay for applications such as autonomous driving, highlighting its potential in outdoor environments. Despite its advantages, SSMDE approaches typically challenge in accurate depth estimation on non-Lambertian surfaces such as mirrors, transparent objects, and specular surfaces. This difficulty primarily arises from the assumption of Lambertian reflectance (Basri & Jacobs, 2003) embedded in most SSMDE methods.