Few-shot Video-to-Video Synthesis

Wang, Ting-Chun, Liu, Ming-Yu, Tao, Andrew, Liu, Guilin, Catanzaro, Bryan, Kautz, Jan

Neural Information Processing Systems 

Video-to-video synthesis (vid2vid) aims at converting an input semantic video, such as videos of human poses or segmentation masks, to an output photorealistic video. While the state-of-the-art of vid2vid has advanced significantly, existing approaches share two major limitations. Numerous images of a target human subject or a scene are required for training. Second, a learned model has limited generalization capability. A pose-to-human vid2vid model can only synthesize poses of the single person in the training set.