Liang, Guoqiang
Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey
Ran, Lingyan, Li, Yali, Liang, Guoqiang, Zhang, Yanning
Semantic segmentation is an important and popular research area in computer vision that focuses on classifying pixels in an image based on their semantics. However, supervised deep learning requires large amounts of data to train models and the process of labeling images pixel by pixel is time-consuming and laborious. This review aims to provide a first comprehensive and organized overview of the state-of-the-art research results on pseudo-label methods in the field of semi-supervised semantic segmentation, which we categorize from different perspectives and present specific methods for specific application areas. In addition, we explore the application of pseudo-label technology in medical and remote-sensing image segmentation. Finally, we also propose some feasible future research directions to address the existing challenges.
Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames
Lu, Yunfan, Liang, Guoqiang, Wang, Lin
This paper makes the first attempt to tackle the challenging task of recovering arbitrary frame rate latent global shutter (GS) frames from two consecutive rolling shutter (RS) frames, guided by the novel event camera data. Although events possess high temporal resolution, beneficial for video frame interpolation (VFI), a hurdle in tackling this task is the lack of paired GS frames. Another challenge is that RS frames are susceptible to distortion when capturing moving objects. To this end, we propose a novel self-supervised framework that leverages events to guide RS frame correction and VFI in a unified framework. Our key idea is to estimate the displacement field (DF) non-linear dense 3D spatiotemporal information of all pixels during the exposure time, allowing for the reciprocal reconstruction between RS and GS frames as well as arbitrary frame rate VFI. Specifically, the displacement field estimation (DFE) module is proposed to estimate the spatiotemporal motion from events to correct the RS distortion and interpolate the GS frames in one step. We then combine the input RS frames and DF to learn a mapping for RS-to-GS frame interpolation. However, as the mapping is highly under-constrained, we couple it with an inverse mapping (i.e., GS-to-RS) and RS frame warping (i.e., RS-to-RS) for self-supervision. As there is a lack of labeled datasets for evaluation, we generate two synthetic datasets and collect a real-world dataset to train and test our method. Experimental results show that our method yields comparable or better performance with prior supervised methods.
Cross-View Person Identification by Matching Human Poses Estimated With Confidence on Each Body Joint
Liang, Guoqiang (Xi'an Jiaotong University) | Lan, Xuguang (University of South Carolina) | Zheng, Kang (Xi'an Jiaotong University, Institute of Artificial Intelligence and Robotics) | Wang, Song (University of South Carolina) | Zheng, Nanning (University of South Carolina)
Cross-view person identification (CVPI) from multiple temporally synchronized videos taken by multiple wearable cameras from different, varying views is a very challenging but important problem, which has attracted more interests recently. Current state-of-the-art performance of CVPI is achieved by matching appearance and motion features across videos, while the matching of pose features does not work effectively given the high inaccuracy of the 3D human pose estimation on videos/images collected in the wild. In this paper, we introduce a new metric of confidence to the 3D human pose estimation and show that the combination of the inaccurately estimated human pose and the inferred confidence metric can be used to boost the CVPI performance---the estimated pose information can be integrated to the appearance and motion features to achieve the new state-of-the-art CVPI performance. More specifically, the estimated confidence metric is measured at each human-body joint and the joints with higher confidence are weighted more in the pose matching for CVPI. In the experiments, we validate the proposed method on three wearable-camera video datasets and compare the performance against several other existing CVPI methods.