In the perfect equilibrium, the generator would capture the general training data distribution. As a result, the discriminator would be always unsure of whether its inputs are real or not. In the DCGAN paper, the authors describe the combination of some deep learning techniques as key for training GANs. These techniques include: (i) the all convolutional net and (ii) Batch Normalization (BN). The first emphasizes strided convolutions (instead of pooling layers) for both: increasing and decreasing feature's spatial dimensions.
This post is part of the "superblog" that is the collective work of the participants of the GAN workshop organized by Aggregate Intellect. This post serves as a proof of work, and covers some of the concepts covered in the workshop in addition to advanced concepts pursued by the participants. The original GAN (Goodfellow, 2014) (https://arxiv.org/abs/1406.2661) is a generative model, where a neural-network is trained to generate realistic images from random noisy input data. GANs generate predicted data by exploiting a competition between two neural networks, a generator (G) and a discriminator (D), where both networks are engaged in prediction tasks. G generates "fake" images from the input data, and D compares the predicted data (output from G) to the real data with results fed back to G. The cyclical loop between G and D is repeated several times to minimize the difference between predicted and ground truth data sets and improve the performance of G, i.e., D is used to improve the performance of G.
This is the Part 4 of a short series of posts introducing and building generative adversarial networks, known as GANs. Previously: Part 1 introduced the idea of adversarial learning and we started to build the machinery of a GAN implementation. Part 2 we extended our code to learn a simple 1-dimensional pattern 1010. Part 3 we developed our code to learn to generate 2-dimensional grey-scale images that look like handwritten digits In this post we'll extend our code again to lean to generate full-colour images, learning from a dataset of celebrity face photos. The ideas should be the same, and the code shouldn't need much new added to it. Celebrity Faces A popular dataset for human faces is the celebA dataset which contains 202,599 photos, annotated with some features. A revised version was developed, called the aligned celebA dataset, where the location of the eyes is consistent across the dataset and the orientation of the heads is vertical so the mouth is below the eyes were possible. The following shows 6 samples from the dataset.
GANs are one of the very few machine learning techniques which has given good performance for generative tasks, or more broadly unsupervised learning. In particular, they have given splendid performance for a variety of image generation related tasks. Yann LeCun, one of the forefathers of deep learning, has called them "the best idea in machine learning in the last 10 years". Most importantly, the core conceptual ideas associated with a GAN are quite simple to understand (and in-fact, you should have a good idea about them by the time you finish reading this article). In this article, we'll explain GANs by applying them to the task of generating images.
The main idea of this paper is to explore the possibilities of generating samples from the neural networks, mostly focusing on the colorization of the grey-scale images. I will compare the existing methods for colorization and explore the possibilities of using new generative modeling to the task of colorization. The contributions of this paper are to compare the existing structures with similar generating structures(Decoders) and to apply the novel structures including Conditional VAE(CVAE), Conditional Wasserstein GAN with Gradient Penalty(CWGAN-GP), CWGAN-GP with L1 reconstruction loss, Adversarial Generative Encoders(AGE) and Introspective VAE(IVAE). I trained these models using CIFAR-10 images. To measure the performance, I use Inception Score(IS) which measures how distinctive each image is and how diverse overall samples are as well as human eyes for CIFAR-10 images. It turns out that CVAE with L1 reconstruction loss and IVAE achieve the highest score in IS. CWGAN-GP with L1 tends to learn faster than CWGAN-GP, but IS does not increase from CWGAN-GP. CWGAN-GP tends to generate more diverse images than other models using reconstruction loss. Also, I figured out that the proper regularization plays a vital role in generative modeling.