StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps

Nie, Lang, Lin, Chunyu, Liao, Kang, Zhang, Yun, Liu, Shuaicheng, Zhao, Yao

arXiv.org Artificial Intelligence 

-- We retarget video stitching to an emerging issue, named warping shake, which unveils the temporal content shakes induced by sequentially unsmooth warps when extending image stitching to video stitching. Even if the input videos are stable, the stitched video can inevitably cause undesired warping shakes and affect the visual experience. T o address this issue, we propose StabStitch++, a novel video stitching framework to realize spatial stitching and temporal stabilization with unsupervised learning simultaneously. First, different from existing learning-based image stitching solutions that typically warp one image to align with another, we suppose a virtual midplane between original image planes and project them onto it. Concretely, we design a differentiable bidirectional decomposition module to disentangle the homography transformation and incorporate it into our spatial warp, evenly spreading alignment burdens and projective distortions across two views. Then, inspired by camera paths in video stabilization, we derive the mathematical expression of stitching trajectories in video stitching by elaborately integrating spatial and temporal warps. Finally, a warp smoothing model is presented to produce stable stitched videos with a hybrid loss to simultaneously encourage content alignment, trajectory smoothness, and online collaboration. Compared with StabStitch that sacrifices alignment for stabilization, StabStitch++ makes no compromise and optimizes both of them simultaneously, especially in the online mode. T o establish an evaluation benchmark and train the learning framework, we build a video stitching dataset with a rich diversity in camera motions and scenes. NTRODUCTION Lang Nie, Chunyu Lin, and Y ao Zhao are with the Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China, and also with Visual Intelligence +X International Cooperation Joint Laboratory of MOE, Beijing 100044, China (e-mail: nielang@bjtu.edu.cn, Kang Liao is with the School of Computing and Data Science, Nanyang Technological University, Singapore (e-mail: kang.liao@ntu.edu.sg). Y un Zhang is with the School of Media Engineering, Communication University of Zhejiang, Hangzhou 310018, China (e-mail: zhangyun@cuz.edu.cn). Shuaicheng Liu is with the School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail: liushuaicheng@uestc.edu.cn). This work was supported by the National Natural Science Foundation of China (NSFC) under Grants U2441242 and 62172032, as well as by the Open Fund of Zhejiang Key Laboratory of Film and TV Media Technology. IDEO stitching techniques are commonly employed to create panoramic or wide field-of-view (FoV) displays from different viewpoints with limited FoV .