pastegan
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Despite some exciting progress on high-quality image generation from structured (scene graphs) or free-form (sentences) descriptions, most of them only guarantee the image-level semantical consistency, i.e. the generated image matching the semantic meaning of the description. They still lack the investigations on synthesizing the images in a more controllable way, like finely manipulating the visual appearance of every object. Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops. To enhance the interactions of the objects in the output, we design a Crop Refining Network and an Object-Image Fuser to embed the objects as well as their relationships into one map. Multiple losses work collaboratively to guarantee the generated images highly respecting the crops and complying with the scene graphs while maintaining excellent image quality. A crop selector is also proposed to pick the most-compatible crops from our external object tank by encoding the interactions around the objects in the scene graph if the crops are not provided. Evaluated on Visual Genome and COCO-Stuff dataset, our proposed method significantly outperforms the SOTA methods on Inception Score, Diversity Score and Fre chet Inception Distance. Extensive experiments also demonstrate our method's ability to generate complex and diverse images with given objects.
- Asia > China > Hong Kong (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Reviews: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Limited novelty: The proposed approach is closely related to two lines of related work: 1) sg2im [4] which generates images from scene graph representations, and 2) semi-parametric image synthesis [3], which leverages semantic layouts and training images to generate novel images. The key difference to sg2im is the use of image crops in order to perform semi-parametric synthesis; however, in comparison to prior work on semi-parametric methods [3], as suggested by the authors (Line 82-83) the primary difference is the use of graph convolution architecture, where a similar graph convolution method has been introduced in [4]. I'd like to see more justifications from the authors regarding the technical novelty of this approach in presence of these two lines of work. Limited resolution: My concern about the limited novelty is exacerbated by the fact that the generated images are still in low-resolution (64x64) as prior work [4], even though high-resolution image crops are used to aid the image generation process. In contrast, related work [3] is able to generate images of much higher resolutions, e.g., 512x1024, using their semi-parametric method (which was not compared in the experiment).
Reviews: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
This submission received borderline positive reviews. While the reviewers ultimately did not reach consensus during the discussion period, one did step forward to'champion' the paper, and another was supportive of this decision. This submission is a'systems paper,' and should be evaluated as such. The paper does not focus on new algorithmic results, but rather on building a nontrivial system to achieve impressive results on an important problem, and it justifies its design decisions (e.g. with an ablation study). There is some concern about the output images being low-resolution.
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
Despite some exciting progress on high-quality image generation from structured (scene graphs) or free-form (sentences) descriptions, most of them only guarantee the image-level semantical consistency, i.e. the generated image matching the semantic meaning of the description. They still lack the investigations on synthesizing the images in a more controllable way, like finely manipulating the visual appearance of every object. Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops. To enhance the interactions of the objects in the output, we design a Crop Refining Network and an Object-Image Fuser to embed the objects as well as their relationships into one map. Multiple losses work collaboratively to guarantee the generated images highly respecting the crops and complying with the scene graphs while maintaining excellent image quality.
PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph
LI, Yikang, Ma, Tao, Bai, Yeqi, Duan, Nan, Wei, Sining, Wang, Xiaogang
Despite some exciting progress on high-quality image generation from structured (scene graphs) or free-form (sentences) descriptions, most of them only guarantee the image-level semantical consistency, i.e. the generated image matching the semantic meaning of the description. They still lack the investigations on synthesizing the images in a more controllable way, like finely manipulating the visual appearance of every object. Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops. To enhance the interactions of the objects in the output, we design a Crop Refining Network and an Object-Image Fuser to embed the objects as well as their relationships into one map. Multiple losses work collaboratively to guarantee the generated images highly respecting the crops and complying with the scene graphs while maintaining excellent image quality.