Most deep latent factor models choose simple priors for simplicity, tractability or not knowing what prior to use. Recent studies show that the choice of the prior may have a profound effect on the expressiveness of the model,especially when its generative network has limited capacity. In this paper, we propose to learn a proper prior from data for adversarial autoencoders(AAEs). We introduce the notion of code generators to transform manually selected simple priors into ones that can better characterize the data distribution. Experimental results show that the proposed model can generate better image quality and learn better disentangled representations than AAEs in both supervised and unsupervised settings. Lastly, we present its ability to do cross-domain translation in a text-to-image synthesis task.
One of the main motivations for training high quality image generative models is their potential use as tools for image manipulation. Recently, generative adversarial networks (GANs) have been able to generate images of remarkable quality. Unfortunately, adversarially-trained unconditional generator networks have not been successful as image priors. One of the main requirements for a network to act as a generative image prior, is being able to generate every possible image from the target distribution. Adversarial learning often experiences mode-collapse, which manifests in generators that cannot generate some modes of the target distribution. Another requirement often not satisfied is invertibility i.e. having an efficient way of finding a valid input latent code given a required output image. In this work, we show that differently from earlier GANs, the very recently proposed style-generators are quite easy to invert. We use this important observation to propose style generators as general purpose image priors. We show that style generators outperform other GANs as well as Deep Image Prior as priors for image enhancement tasks. The latent space spanned by style-generators satisfies linear identity-pose relations. The latent space linearity, combined with invertibility, allows us to animate still facial images without supervision. Extensive experiments are performed to support the main contributions of this paper.
In this paper, we show that the approximation for distributions by Wasserstein GAN depends on both the width/depth (capacity) of generators and discriminators, as well as the number of samples in training. A quantified generalization bound is developed for Wasserstein distance between the generated distribution and the target distribution. It implies that with sufficient training samples, for generators and discriminators with proper number of width and depth, the learned Wasserstein GAN can approximate distributions well. We discover that discriminators suffer a lot from the curse of dimensionality, meaning that GANs have higher requirement for the capacity of discriminators than generators, which is consistent with the theory in arXiv:1703.00573v5 [cs.LG]. More importantly, overly deep (high capacity) generators may cause worse results (after training) than low capacity generators if discriminators are not strong enough. Different from Wasserstein GAN in arXiv:1701.07875v3 [stat.ML], we adopt GroupSort neural networks arXiv:1811.05381v2 [cs.LG] in the model for their better approximation to 1-Lipschitz functions. Compared to some existing generalization (convergence) analysis of GANs, we expect our work are more applicable.
Generative adversarial networks (GANs) have shown great promise in generating complex data such as images. A standard practice in GANs is to discard the discriminator after training and use only the generator for sampling. However, this loses valuable information of real data distribution learned by the discriminator. In this work, we propose a collaborative sampling scheme between the generator and discriminator for improved data generation. Guided by the discriminator, our approach refines generated samples through gradient-based optimization, shifting the generator distribution closer to the real data distribution. Additionally, we present a practical discriminator shaping method that can further improve the sample refinement process. Orthogonal to existing GAN variants, our proposed method offers a new degree of freedom in GAN sampling. We demonstrate its efficacy through experiments on synthetic data and image generation tasks.
Implicit generative models are difficult to train as no explicit probability density functions are defined. The well-known minimax framework proposed by generative adversarial nets (GANs) is equivalent to minimizing the Jensen-Shannon divergence and suffers from mode collapse in practice. In this paper, we propose learning by teaching (LBT) framework to train implicit generative models via incorporating an auxiliary explicit model. In LBT, an explicit model is introduced to learn the distribution defined by the implicit model and the later one's goal is to teach the explicit model to cover the training data. Formally, our method is formulated as a bilevel optimization problem, whose optimum implies that we obatin the MLE of the implicit model. We also adopt the unrolling trick to make the optimization problem differentiable with respect to the implicit model's parameters. Experimental results demonstrate the effectiveness of our proposed method.