In the perfect equilibrium, the generator would capture the general training data distribution. As a result, the discriminator would be always unsure of whether its inputs are real or not. In the DCGAN paper, the authors describe the combination of some deep learning techniques as key for training GANs. These techniques include: (i) the all convolutional net and (ii) Batch Normalization (BN). The first emphasizes strided convolutions (instead of pooling layers) for both: increasing and decreasing feature's spatial dimensions.
This is the Part 4 of a short series of posts introducing and building generative adversarial networks, known as GANs. Previously: Part 1 introduced the idea of adversarial learning and we started to build the machinery of a GAN implementation. Part 2 we extended our code to learn a simple 1-dimensional pattern 1010. Part 3 we developed our code to learn to generate 2-dimensional grey-scale images that look like handwritten digits In this post we'll extend our code again to lean to generate full-colour images, learning from a dataset of celebrity face photos. The ideas should be the same, and the code shouldn't need much new added to it. Celebrity Faces A popular dataset for human faces is the celebA dataset which contains 202,599 photos, annotated with some features. A revised version was developed, called the aligned celebA dataset, where the location of the eyes is consistent across the dataset and the orientation of the heads is vertical so the mouth is below the eyes were possible. The following shows 6 samples from the dataset.
GANs are one of the very few machine learning techniques which has given good performance for generative tasks, or more broadly unsupervised learning. In particular, they have given splendid performance for a variety of image generation related tasks. Yann LeCun, one of the forefathers of deep learning, has called them "the best idea in machine learning in the last 10 years". Most importantly, the core conceptual ideas associated with a GAN are quite simple to understand (and in-fact, you should have a good idea about them by the time you finish reading this article). In this article, we'll explain GANs by applying them to the task of generating images.
The main idea of this paper is to explore the possibilities of generating samples from the neural networks, mostly focusing on the colorization of the grey-scale images. I will compare the existing methods for colorization and explore the possibilities of using new generative modeling to the task of colorization. The contributions of this paper are to compare the existing structures with similar generating structures(Decoders) and to apply the novel structures including Conditional VAE(CVAE), Conditional Wasserstein GAN with Gradient Penalty(CWGAN-GP), CWGAN-GP with L1 reconstruction loss, Adversarial Generative Encoders(AGE) and Introspective VAE(IVAE). I trained these models using CIFAR-10 images. To measure the performance, I use Inception Score(IS) which measures how distinctive each image is and how diverse overall samples are as well as human eyes for CIFAR-10 images. It turns out that CVAE with L1 reconstruction loss and IVAE achieve the highest score in IS. CWGAN-GP with L1 tends to learn faster than CWGAN-GP, but IS does not increase from CWGAN-GP. CWGAN-GP tends to generate more diverse images than other models using reconstruction loss. Also, I figured out that the proper regularization plays a vital role in generative modeling.
Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.