Goto

Collaborating Authors

 conditional image generation



Conditional Image Generation with PixelCNN Decoders

Neural Information Processing Systems

This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.


Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications.




Review for NeurIPS paper: ContraGAN: Contrastive Learning for Conditional Image Generation

Neural Information Processing Systems

Reviewers were split on this paper with three recommending accept and one recommending reject. The main concerns were missing experiments on ImageNet and lack of clarify on why the method should work, particularly with regard to how it stabilizes training. After the rebuttal, the reviewers and AC were more confident in the experimental results and recommend acceptance, but the authors are urged to 1) complete the full experiments on ImageNet, 2) analyze stability over multiple runs and provide some discussion of why the proposed method should help stability. Also please see the other detailed recommendations in the reviews.


Reviews: Conditional Image Generation with PixelCNN Decoders

Neural Information Processing Systems

The paper solves a significant problem in generative modeling and the paper is quite interesting. However, reviewer feels the current version is not polished well due to several issues in the experimental section. For rebuttal, please focus on the (*), (**), (***) and (***) mentioned in the following paragraphs. Reviewer is willing to change the score if all the concerns are addressed in the rebuttal. Novelty: The proposed model is technically novel in the sense that it explores the conditional modeling with the recent pixel (R/C)NN framework.


ContraGAN: Contrastive Learning for Conditional Image Generation

Neural Information Processing Systems

Conditional image generation is the task of generating diverse images using class label information. Although many conditional Generative Adversarial Networks (GAN) have shown realistic results, such methods consider pairwise relations between the embedding of an image and the embedding of the corresponding label (data-to-class relations) as the conditioning losses. In this paper, we propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss. The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images. Simultaneously, the generator tries to generate realistic images that deceive the authenticity and have a low contrastive loss.


Reviews: Unsupervised Learning of Object Landmarks through Conditional Image Generation

Neural Information Processing Systems

Summary: This paper proposes a method for conditional image generation by jointly learning "structure" points such as face and body landmarks. The authors propose to use a convolutional neural network with a modified loss to capture the image transformation and landmarks. They evaluate their approach on a set of datasets including CelebA, VoxCeleb, and Human 3.6M. Positive: -The problem addressed is an important problem and the authors attempt to solve it using a well engineered approach. Negatives: -The pre-processing using heat maps, normalizing them into probabilities, then using a gaussian kernel to produce the features is a bit heuristic.


Conditional Image Generation with Pretrained Generative Model

Shrestha, Rajesh, Xie, Bowen

arXiv.org Artificial Intelligence

In recent years, diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. However, like any other large generative models, these models require a huge amount of data, computational resources, and meticulous tuning for successful training. This poses a significant challenge, rendering it infeasible for most individuals. As a result, the research community has devised methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative. These methods enable conditional image generations on diverse inputs and, most importantly, circumvent the need for training the diffusion model. In this paper, our objective is to reduce the time-required and computational overhead introduced by the addition of guidance in diffusion models -- while maintaining comparable image quality. We propose a set of methods based on our empirical analysis, demonstrating a reduction in computation time by approximately threefold.