Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. We adapt the training procedure of Wasserstein GANs by replacing the differentiable generative network with a domain-specific simulator. We solve the resulting non-differentiable minimax problem by minimizing variational upper bounds of the two adversarial objectives. Effectively, the procedure results in learning a proposal distribution over simulator parameters, such that the Wasserstein distance between the marginal distribution of the synthetic data and the empirical distribution of observed data is minimized. We present results of the method with simulators producing both discrete and continuous data.

We derive a novel variational expectation maximization approach based on truncated variational distributions. Truncated distributions are proportional to exact posteriors within a subset of a discrete state space and equal zero otherwise. The novel variational approach is realized by first generalizing the standard variational EM framework to include variational distributions with exact (`hard') zeros. A fully variational treatment of truncated distributions then allows for deriving novel and mathematically grounded results, which in turn can be used to formulate novel efficient algorithms to optimize the parameters of probabilistic generative models. We find the free energies which correspond to truncated distributions to be given by concise and efficiently computable expressions, while update equations for model parameters (M-steps) remain in their standard form. Furthermore, we obtain generic expressions for expectation values w.r.t. truncated distributions. Based on these observations, we show how efficient and easily applicable meta-algorithms can be formulated that guarantee a monotonic increase of the free energy. Example applications of the here derived framework provide novel theoretical results and learning procedures for latent variable models as well as mixture models including procedures to tightly couple sampling and variational optimization approaches. Furthermore, by considering a special case of truncated variational distributions, we can cleanly and fully embed the well-known `hard EM' approaches into the variational EM framework, and we show that `hard EM' (for models with discrete latents) provably optimizes a lower free energy bound of the data log-likelihood.

Rainforth, Tom, Kosiorek, Adam R., Le, Tuan Anh, Maddison, Chris J., Igl, Maximilian, Wood, Frank, Teh, Yee Whye

We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.

Ranganath, Rajesh, Tran, Dustin, Altosaar, Jaan, Blei, David

Variational inference is an umbrella term for algorithms which cast Bayesian inference as optimization. Classically, variational inference uses the Kullback-Leibler divergence to define the optimization. Though this divergence has been widely used, the resultant posterior approximation can suffer from undesirable statistical properties. To address this, we reexamine variational inference from its roots as an optimization problem. We use operators, or functions of functions, to design variational objectives. As one example, we design a variational objective with a Langevin-Stein operator. We develop a black box algorithm, operator variational inference (OPVI), for optimizing any operator objective. Importantly, operators enable us to make explicit the statistical and computational tradeoffs for variational inference. We can characterize different properties of variational objectives, such as objectives that admit data subsampling---allowing inference to scale to massive data---as well as objectives that admit variational programs---a rich class of posterior approximations that does not require a tractable density. We illustrate the benefits of OPVI on a mixture model and a generative model of images.