predict layout-to-image conditional convolution
Reviews: Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
This paper proposes a strongly conditional network for generating images from semantic maps. How impacted is this network by small changes in the input map - for example given 3 sequential frames of a video (as segmentation maps) - is the model consistent in assigning colors and structures? Or do small changes in the geometry of the semantic objects have a large impact on the output? This is mostly curiousity, as having smoothness inherent in the model has large potential for video applications. Some amount of qualitative results comparing to other models were shown, but showing the important regions of the input conditioning, and the influence of input perturbations on the model output could also lead to valuable insight - using something like GradCAM or related methods may be possible for checking the importance of input features.
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images. In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images. Moreover, we propose a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts than previous multi-scale discriminators.
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
Liu, Xihui, Yin, Guojun, Shao, Jing, Wang, Xiaogang, Li, hongsheng
Semantic image synthesis aims at generating photorealistic images from semantic layouts. Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them to modulate the activations in normalization layers via affine transformations. We argue that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images. In order to better exploit the semantic layout for the image generator, we propose to predict convolutional kernels conditioned on the semantic label map to generate the intermediate feature maps from the noise maps and eventually generate the images. Moreover, we propose a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts than previous multi-scale discriminators.