Goto

Collaborating Authors

IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Neural Information Processing Systems

We present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images. IntroVAE is capable of self-evaluating the quality of its generated samples and improving itself accordingly. Its inference and generator models are jointly trained in an introspective way. On one hand, the generator is required to reconstruct the input images from the noisy outputs of the inference model as normal VAEs. On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs. These two famous generative frameworks are integrated in a simple yet efficient single-stream architecture that can be trained in a single stage. IntroVAE preserves the advantages of VAEs, such as stable training and nice latent manifold. Unlike most other hybrid models of VAEs and GANs, IntroVAE requires no extra discriminators, because the inference model itself serves as a discriminator to distinguish between the generated and real samples. Experiments demonstrate that our method produces high-resolution photo-realistic images (e.g., CELEBA images at \(1024^{2}\)), which are comparable to or better than the state-of-the-art GANs.


IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Neural Information Processing Systems

We present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images. IntroVAE is capable of self-evaluating the quality of its generated samples and improving itself accordingly. Its inference and generator models are jointly trained in an introspective way. On one hand, the generator is required to reconstruct the input images from the noisy outputs of the inference model as normal VAEs. On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs. These two famous generative frameworks are integrated in a simple yet efficient single-stream architecture that can be trained in a single stage. IntroVAE preserves the advantages of VAEs, such as stable training and nice latent manifold. Unlike most other hybrid models of VAEs and GANs, IntroVAE requires no extra discriminators, because the inference model itself serves as a discriminator to distinguish between the generated and real samples. Experiments demonstrate that our method produces high-resolution photo-realistic images (e.g., CELEBA images at \(1024^{2}\)), which are comparable to or better than the state-of-the-art GANs.


Multi-Level Variational Autoencoder: Learning Disentangled Representations From Grouped Observations

AAAI Conferences

We would like to learn a representation of the data that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a set of face images grouped by identity. We wish to anchor the semantics of the grouping into a disentangled representation that we can exploit. However, existing deep probabilistic models often assume that the samples are independent and identically distributed, thereby disregard the grouping information. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of grouped data. The ML-VAE separates the latent representation into semantically relevant parts by working both at the group level and the observation level, while retaining efficient test-time inference. We experimentally show that our model (i) learns a semantically meaningful disentanglement, (ii) enables control over the latent representation, and (iii) generalises to unseen groups.


Ladder Variational Autoencoders

Neural Information Processing Systems

Variational autoencoders are powerful models for unsupervised learning. However deep models with several layers of dependent stochastic variables are difficult to train which limits the improvements obtained using these highly expressive models. We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. We show that this model provides state of the art predictive log-likelihood and tighter log-likelihood lower bound compared to the purely bottom-up inference in layered Variational Autoencoders and other generative models. We provide a detailed analysis of the learned hierarchical latent representation and show that our new inference model is qualitatively different and utilizes a deeper more distributed hierarchy of latent variables.


Training Variational Autoencoders with Buffered Stochastic Variational Inference

arXiv.org Machine Learning

The recognition network in deep latent variable models such as variational autoencoders (VAEs) relies on amortized inference for efficient posterior approximation that can scale up to large datasets. However, this technique has also been demonstrated to select suboptimal variational parameters, often resulting in considerable additional error called the amortization gap. To close the amortization gap and improve the training of the generative model, recent works have introduced an additional refinement step that applies stochastic variational inference (SVI) to improve upon the variational parameters returned by the amortized inference model. In this paper, we propose the Buffered Stochastic Variational Inference (BSVI), a new refinement procedure that makes use of SVI's sequence of intermediate variational proposal distributions and their corresponding importance weights to construct a new generalized importance-weighted lower bound. We demonstrate empirically that training the variational autoencoders with BSVI consistently out-performs SVI, yielding an improved training procedure for VAEs.