Highlight: In this post, we will be discussing Variational Autoencoders (VAE). In order to fully understand the underlying ideas, we need to have a basic understanding of traditional Autoencoders. Luckily, we have already written about them in our previous posts. This post will consist of several topics. First, we will review autoencoders. Then, we will give some review of basic probability concepts. Next, we will explain what Kullback Leibler divergence is. In addition, we will talk about the loss function and how it can be derived. So, the first concept that we're going to review is autoencoders. Sometimes, we also call them the stacked autoencoders. One application of autoencoders is image compression. The pipeline is commonly presented as a block diagram in the following way. We have an input image that goes into an encoder part. The input can be a simple image like the one from the MNIST data set. As you can see here, it is a digit \(3\). Once this digit passes through the network, we want to reconstruct the original image at its output as closely as possible. For that, we use the cost function \(L \). Here, we have two parameters \(\theta \) and \(\phi \).

It is a very well-designed library that clearly abides by its guiding principles of modularity and extensibility, enabling us to easily assemble powerful, complex models from primitive building blocks. This has been demonstrated in numerous blog posts and tutorials, in particular, the excellent tutorial on Building Autoencoders in Keras. As the name suggests, that tutorial provides examples of how to implement various kinds of autoencoders in Keras, including the variational autoencoder (VAE) [1]. Visualization of 2D manifold of MNIST digits (left) and the representation of digits in latent space colored according to their digit labels (right). Like all autoencoders, the variational autoencoder is primarily used for unsupervised learning of hidden representations.

Makhzani, Alireza, Frey, Brendan J.

In this paper, we describe the "PixelGAN autoencoder", a generative autoencoder in which the generative path is a convolutional autoregressive neural network on pixels (PixelCNN) that is conditioned on a latent code, and the recognition path uses a generative adversarial network (GAN) to impose a prior distribution on the latent code. We show that different priors result in different decompositions of information between the latent code and the autoregressive decoder. For example, by imposing a Gaussian distribution as the prior, we can achieve a global vs. local decomposition, or by imposing a categorical distribution as the prior, we can disentangle the style and content information of images in an unsupervised fashion. We further show how the PixelGAN autoencoder with a categorical prior can be directly used in semi-supervised settings and achieve competitive semi-supervised classification results on the MNIST, SVHN and NORB datasets.

Anomaly detection is one of those domains in which machine learning has made such an impact that today it almost goes without saying that anomaly detection systems must be based on some form of automatic pattern learning algorithm rather than on a set of rules or descriptive statistics (though many reliable anomaly detection systems operate using such methods very successfully and efficiently). Indeed, a variety of ML approaches to anomaly detection have become increasingly popular over the past decade or so. Some approaches, such as One-Class SVM, try to identify the "normal" area or plane in the dimensional space in which the data is spread out and then mark as anomalous any sample that lies outside that area. Other approaches attempt to estimate the parameters of a distribution (or a mixture of distributions) that represent the training data and then designate as anomalous any sample that seems considerably less likely under it. Each approach has its own assumptions and weaknesses that need to be taken into account, and this is partly why it is important to test and fit the anomaly detection algorithm to the particular domain.

In this paper, we describe the "implicit autoencoder" (IAE), a generative autoencoder in which both the generative path and the recognition path are parametrized by implicit distributions. We use two generative adversarial networks to define the reconstruction and the regularization cost functions of the implicit autoencoder, and derive the learning rules based on maximum-likelihood learning. Using implicit distributions allows us to learn more expressive posterior and conditional likelihood distributions for the autoencoder. Learning an expressive conditional likelihood distribution enables the latent code to only capture the abstract and high-level information of the data, while the remaining information is captured by the implicit conditional likelihood distribution. For example, we show that implicit autoencoders can disentangle the global and local information, and perform deterministic or stochastic reconstructions of the images. We further show that implicit autoencoders can disentangle discrete underlying factors of variation from the continuous factors in an unsupervised fashion, and perform clustering and semi-supervised learning.