Collaborating Authors

On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results Machine Learning

We study in this paper the rate of convergence for learning distributions with the Generative Adversarial Networks (GAN) framework, which subsumes Wasserstein, Sobolev and MMD GANs as special cases. We study a wide range of parametric and nonparametric target distributions, under a collection of objective evaluation metrics. On the nonparametric end, we investigate the minimax optimal rates and fundamental difficulty of the density estimation under the adversarial framework. On the parametric end, we establish theory for neural network classes, that characterizes the interplay between the choice of generator and discriminator. We investigate how to improve the GAN framework with better theoretical guarantee through the lens of regularization. We discover and isolate a new notion of regularization, called the \textit{generator/discriminator pair regularization}, that sheds light on the advantage of GAN compared to classic parametric and nonparametric approaches for density estimation.

Approximation for Probability Distributions by Wasserstein GAN Machine Learning

In this paper, we show that the approximation for distributions by Wasserstein GAN depends on both the width/depth (capacity) of generators and discriminators, as well as the number of samples in training. A quantified generalization bound is developed for Wasserstein distance between the generated distribution and the target distribution. It implies that with sufficient training samples, for generators and discriminators with proper number of width and depth, the learned Wasserstein GAN can approximate distributions well. We discover that discriminators suffer a lot from the curse of dimensionality, meaning that GANs have higher requirement for the capacity of discriminators than generators, which is consistent with the theory in arXiv:1703.00573v5 [cs.LG]. More importantly, overly deep (high capacity) generators may cause worse results (after training) than low capacity generators if discriminators are not strong enough. Different from Wasserstein GAN in arXiv:1701.07875v3 [stat.ML], we adopt GroupSort neural networks arXiv:1811.05381v2 [cs.LG] in the model for their better approximation to 1-Lipschitz functions. Compared to some existing generalization (convergence) analysis of GANs, we expect our work are more applicable.

Bayesian Conditional Generative Adverserial Networks Machine Learning

Traditional GANs use a deterministic generator function (typically a neural network) to transform a random noise input $z$ to a sample $\mathbf{x}$ that the discriminator seeks to distinguish. We propose a new GAN called Bayesian Conditional Generative Adversarial Networks (BC-GANs) that use a random generator function to transform a deterministic input $y'$ to a sample $\mathbf{x}$. Our BC-GANs extend traditional GANs to a Bayesian framework, and naturally handle unsupervised learning, supervised learning, and semi-supervised learning problems. Experiments show that the proposed BC-GANs outperforms the state-of-the-arts.

Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures Machine Learning

Generative Adversarial Nets (GANs) are emerging objects of study in machine learning, computer vision, natural language processing, and many other domains. In machine learning, study of such a framework has led to significant advances in adversarial defenses [28, 24] and machine security [4, 24]. In computer vision and natural language processing, GANs have resulted in improved performance over standard generative models for images and texts [13], such as variational autoencoder [16] and deep Boltzmann machine [22]. A main technique to achieve this goal is to play a minimax two-player game between generator and discriminator under the design that the generator tries to confuse the discriminator with its generated contents and the discriminator tries to distinguish real images/texts from what the generator creates. Despite a large amount of variants of GANs, many fundamental questions remain unresolved. One of the longstanding challenges is designing universal, easy-to-implement architectures that alleviate the instability issue of GANs training. Ideally, GANs are supposed to solve the minimax optimization problem [13], but in practice alternating gradient descent methods do not clearly privilege minimax over maximin or vice versa (page 35, [12]), which may lead to instability in training if there exists a large discrepancy between the minimax and maximin objective values. The focus of this work is on improving the stability of such minimax game in the training process of GANs. 1 Under review as a conference paper at ICLR 2019

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions Machine Learning

Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation.