Bayesian Inference
Model Selection for Bayesian Autoencoders: Supplementary Material Ba-Hien Tran EURECOM (France) Simone Rossi
In this section, we review some key results on the Wasserstein distance. The formulation in Eq. 6 is obtained by employing We use a single multi layer perceptron (MLP) layer with normalized output as the h function. Calculating the Wasserstein distance with the empirical distribution function is computationally attractive. Metropolis steps to accommodate numerical errors stemming from the integration. F .1 Experimental environment In our experiments, we use 4 workstations, which have the following specifications: GPU: NVIDIA Tesla P100 PCIe 16 GB.
Stochastic Stein Discrepancies
Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable. However, the computation of a Stein discrepancy can be prohibitive if the Stein operator - often a sum over likelihood terms or potentials - is expensive to evaluate.
Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions Yin Tat Lee Ruoqi Shen Kevin Tian
Sampling from a continuous distribution in high dimensions is a fundamental problem in algorithm design. As sampling serves as a key subroutine in a variety of tasks in machine learning [AdFDJ03], statistical methods [RC99], and scientific computing [Liu01], it is an important undertaking to understand the complexity of sampling from families of distributions arising in applications. The more restricted problem of sampling from a particular family of distributions, which we call "well-conditioned distributions," has garnered a substantial amount of recent research effort from the algorithmic learning and statistics communities. This specific family is interesting for a number of reasons. First of all, it is practically relevant: Bayesian methods have found increasing use in machine learning applications [Bar12], and many distributions arising from these methods are well-conditioned, such as multivariate Gaussians, mixture models with small separation, and densities arising from Bayesian logistic regression with a Gaussian prior [DCWY18].