Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges

Shin, Ukcheol, Park, Jinsun

arXiv.org Artificial Intelligence 

--Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera ( i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. AUTONOMOUS driving aims to develop intelligent vehicles capable of perceiving their surrounding environments, understanding current contextual information, and making decisions to drive safely without human intervention. Recent advancements in autonomous vehicles, such as Tesla and Waymo, have been driven by deep neural networks and large-scale vehicular datasets, such as KITTI [1], DDAD [2], and nuScenes [3]. Manuscript received March XX, 2025; revised April XX, 2025. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(RS-2024-00358935). Ukcheol Shin is with the Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America (e-mail: ushin@andrew.cmu.edu). Jinsun Park is with the School of Computer Science and Engineering, Pusan National University, Busan, Republic of Korea (e-mail: jspark@pusan.ac.kr). Color versions of one or more figures in this article are available at https://doi.org/xx.xxxx/TIV However, a major drawback of existing vehicular datasets is their reliance on visible-spectrum images, which are easily affected by weather and lighting conditions such as rain, fog, dust, haze, and low light. Therefore, recent research has actively explored alternative sensors such as Near-Infrared (NIR) cameras [8], Li-DARs [9], [10], radars [11], [12], and long-wave infrared (LWIR) cameras [13], [14] to achieve reliable and robust visual perception in adverse weather and lighting conditions. Among these sensors, LWIR camera ( i.e., thermal camera) has gained popularity because of its competitive price, adverse weather robustness, and unique modality information ( i.e., temperature).