inference model
Appendix
We first introduce some handy concepts and results to make the proof succinct, meanwhile providing more information for understanding our model and theory. We begin with some extended discussions on CSG. Note that a reparameterization unnecessarily has its output dimensions in S, i.e. The condition that p(y|s) = p0(y|ฮฆS(s,v)) for any v V does not indicate that ฮฆS(s,v) is constant of v, since p0(y|s0) may ignore the change of s0 = ฮฆS(s,v) from the change of v. The following lemma shows the meaning of a reparameterization: it allows a CSG to vary while inducing the same distribution on the observed data variables (x,y) (i.e., holding the same effect on describing data). We can now define and verify an equivalent relation on CSGs so that the resulting equivalent class contains CSGs that induce the same (x,y) data distribution and hold the same semantic information in their svariables. We say two CSGs pand p0 are semantic-equivalent, if there exists a homeomorphism11 ฮฆ on S V, such that (i) is semantic-preserving: its output dimensions in S is constant of v, ฮฆS(s,v) = ฮฆS(s) for any v V, and (ii) it acts as a reparameterization from p to p0: ฮฆ#[ps,v] = p0s,v, p(x|s,v) = p0(x|ฮฆ(s,v)) and p(y|s) = p0(y|ฮฆS(s)). A.1 below shows that the defined binary relation is indeed an equivalence relation in common cases. As a reparameterization, ฮฆ allows the two models to have different latent-variable parameterizations while inducing the same distribution on the observed data variables (x,y) (Lemma 9). This definition of semantic-equivalence can be rephrased as the existence of a semantic-preserving reparameterization. With proper model assumptions, we can show that any reparameterization between two CSGs is semantic-preserving, so that semantic-preserving CSGs cannot be converted to each other by a reparameterization that mixes swith v. Lemma 11. For two CSGs pand p0, if p0(y|s) has a statistics M0(s) that is an injective function of s, then any reparameterization ฮฆ from pto p0, if exists, has its ฮฆS constant of v. Proof. Then the condition that p(y|s) = p0(y|ฮฆS(s,v)) for any v V indicates that M(s) = M0(ฮฆS(s,v)). If there exist s S and v(1) 6= v(2) V such that ฮฆS(s,v(1)) 6= ฮฆS(s,v(2)), then M0(ฮฆS(s,v(1))) 6= M0(ฮฆS(s,v(2))) 11A transformation is a homeomorphism if it is a continuous bijection with continuous inverse. This violates M(s) = M0(ฮฆS(s,v)) which requires both M0(ฮฆS(s,v(1))) and M0(ฮฆS(s,v(2))) to be equal to M(s). We then introduce two mathematical facts. Let z be a random variable on a Euclidean space RdZ with density function pz(z), and let ฮฆ be a homeomorphism on RdZ whose inverse ฮฆ 1 is differentiable.
Improved Variational Inference with Inverse Autoregressive Flow
Durk P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, Max Welling
The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.
Amortized Inference Regularization
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overly-expressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.
IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis
We present a novel introspective variational autoencoder (IntroVAE) model for synthesizing high-resolution photographic images. IntroVAE is capable of self-evaluating the quality of its generated samples and improving itself accordingly. Its inference and generator models are jointly trained in an introspective way. On one hand, the generator is required to reconstruct the input images from the noisy outputs of the inference model as normal VAEs. On the other hand, the inference model is encouraged to classify between the generated and real samples while the generator tries to fool it as GANs. These two famous generative frameworks are integrated in a simple yet efficient single-stream architecture that can be trained in a single stage. IntroVAE preserves the advantages of VAEs, such as stable training and nice latent manifold. Unlike most other hybrid models of VAEs and GANs, IntroVAE requires no extra discriminators, because the inference model itself serves as a discriminator to distinguish between the generated and real samples. Experiments demonstrate that our method produces high-resolution photo-realistic images (e.g., CELEBA images at (1024^{2})), which are comparable to or better than the state-of-the-art GANs.
Supplementary Material for Learning Energy-based Model via Dual-MCMC Teaching
We show additional image synthesis in Fig.2. For reported numbers in main text, we adopt the network structure that contains Residue Blocks (see implementation details in Tab.5). We then test our model for the task of image inpainting. As shown in Fig.1, our This is the marginal version of Eqn.8 shown in the main text. 2 2.3 Learning Algorithm Three models are trained in an alternative and iterative manner based on the current model parameters. Compared to Eqn.3 and Eqn.6 in the main text, Eqn.5 and Eqn.6 start with initial points initialized We present the learning algorithm in Alg.1.