Shin, Ukcheol
Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
Shin, Ukcheol, Park, Jinsun
--Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera ( i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. AUTONOMOUS driving aims to develop intelligent vehicles capable of perceiving their surrounding environments, understanding current contextual information, and making decisions to drive safely without human intervention. Recent advancements in autonomous vehicles, such as Tesla and Waymo, have been driven by deep neural networks and large-scale vehicular datasets, such as KITTI [1], DDAD [2], and nuScenes [3]. Manuscript received March XX, 2025; revised April XX, 2025. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT)(RS-2024-00358935). Ukcheol Shin is with the Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America (e-mail: ushin@andrew.cmu.edu). Jinsun Park is with the School of Computer Science and Engineering, Pusan National University, Busan, Republic of Korea (e-mail: jspark@pusan.ac.kr). Color versions of one or more figures in this article are available at https://doi.org/xx.xxxx/TIV However, a major drawback of existing vehicular datasets is their reliance on visible-spectrum images, which are easily affected by weather and lighting conditions such as rain, fog, dust, haze, and low light. Therefore, recent research has actively explored alternative sensors such as Near-Infrared (NIR) cameras [8], Li-DARs [9], [10], radars [11], [12], and long-wave infrared (LWIR) cameras [13], [14] to achieve reliable and robust visual perception in adverse weather and lighting conditions. Among these sensors, LWIR camera ( i.e., thermal camera) has gained popularity because of its competitive price, adverse weather robustness, and unique modality information ( i.e., temperature).
Bridging Spectral-wise and Multi-spectral Depth Estimation via Geometry-guided Contrastive Learning
Shin, Ukcheol, Lee, Kyunghyun, Oh, Jean
Deploying depth estimation networks in the real world requires high-level robustness against various adverse conditions to ensure safe and reliable autonomy. For this purpose, many autonomous vehicles employ multi-modal sensor systems, including an RGB camera, NIR camera, thermal camera, LiDAR, or Radar. They mainly adopt two strategies to use multiple sensors: modality-wise and multi-modal fused inference. The former method is flexible but memory-inefficient, unreliable, and vulnerable. Multi-modal fusion can provide high-level reliability, yet it needs a specialized architecture. In this paper, we propose an effective solution, named align-and-fuse strategy, for the depth estimation from multi-spectral images. In the align stage, we align embedding spaces between multiple spectrum bands to learn shareable representation across multi-spectral images by minimizing contrastive loss of global and spatially aligned local features with geometry cue. After that, in the fuse stage, we train an attachable feature fusion module that can selectively aggregate the multi-spectral features for reliable and robust prediction results. Based on the proposed method, a single-depth network can achieve both spectral-invariant and multi-spectral fused depth estimation while preserving reliability, memory efficiency, and flexibility.
Learning to Control Camera Exposure via Reinforcement Learning
Lee, Kyunghyun, Shin, Ukcheol, Lee, Byeong-Uk
Adjusting camera exposure in arbitrary lighting conditions is the first step to ensure the functionality of computer vision applications. Poorly adjusted camera exposure often leads to critical failure and performance degradation. Traditional camera exposure control methods require multiple convergence steps and time-consuming processes, making them unsuitable for dynamic lighting conditions. In this paper, we propose a new camera exposure control framework that rapidly controls camera exposure while performing real-time processing by exploiting deep reinforcement learning. The proposed framework consists of four contributions: 1) a simplified training ground to simulate real-world's diverse and dynamic lighting changes, 2) flickering and image attribute-aware reward design, along with lightweight state design for real-time processing, 3) a static-to-dynamic lighting curriculum to gradually improve the agent's exposure-adjusting capability, and 4) domain randomization techniques to alleviate the limitation of the training ground and achieve seamless generalization in the wild.As a result, our proposed method rapidly reaches a desired exposure level within five steps with real-time processing (1 ms). Also, the acquired images are well-exposed and show superiority in various computer vision tasks, such as feature extraction and object detection.
Learning Quadrupedal Locomotion with Impaired Joints Using Random Joint Masking
Kim, Mincheol, Shin, Ukcheol, Kim, Jung-Yup
Abstract-- Quadrupedal robots have played a crucial role in various environments, from structured environments to complex harsh terrains, thanks to their agile locomotion ability. However, these robots can easily lose their locomotion functionality if damaged by external accidents or internal malfunctions. The proposed framework consists of three components: 1) a random joint masking strategy for simulating impaired joint scenarios, 2) a joint state estimator to predict an implicit status of current joint condition based on past observation history, and 3) progressive curriculum learning to allow a single network to conduct both normal gait and various joint-impaired gaits. We verify that our framework enables the Unitree's Go1 robot to walk under various impaired joint conditions in realworld indoor and outdoor environments. Recently, quadrupedal robots have played a crucial role in various applications, such as human rescue, disaster response, and harsh terrain exploration [1], [2], [3].
Complementary Random Masking for RGB-Thermal Semantic Segmentation
Shin, Ukcheol, Lee, Kyunghyun, Kweon, In So
RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn complementary and meaningful representations for each modality. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. The proposed masking strategy prevents over-reliance on a single modality. It also improves the accuracy and robustness of the neural network by forcing the network to segment and classify objects even when one modality is partially available. Also, the proposed self-distillation loss encourages the network to extract complementary and meaningful representations from a single modality or complementary masked modalities. Based on the proposed method, we achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks. Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg.
DRL-ISP: Multi-Objective Camera ISP with Deep Reinforcement Learning
Shin, Ukcheol, Lee, Kyunghyun, Kweon, In So
In this paper, we propose a multi-objective camera ISP framework that utilizes Deep Reinforcement Learning (DRL) and camera ISP toolbox that consist of network-based and conventional ISP tools. The proposed DRL-based camera ISP framework iteratively selects a proper tool from the toolbox and applies it to the image to maximize a given vision task-specific reward function. For this purpose, we implement total 51 ISP tools that include exposure correction, color-and-tone correction, white balance, sharpening, denoising, and the others. We also propose an efficient DRL network architecture that can extract the various aspects of an image and make a rigid mapping relationship between images and a large number of actions. Our proposed DRL-based ISP framework effectively improves the image quality according to each vision task such as RAW-to-RGB image restoration, 2D object detection, and monocular depth estimation.
An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
Lee, Kyunghyun, Lee, Byeong-Uk, Shin, Ukcheol, Kweon, In So
Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.