Variational Bayes (VB) is a scalable alternative to Markov chain Monte Carlo (MCMC) for Bayesian posterior inference. Though popular, VB comes with few theoretical guarantees, most of which focus on well-specified models. However, models are rarely well-specified in practice. In this work, we study VB under model misspecification. We prove the VB posterior is asymptotically normal and centers at the value that minimizes the Kullback-Leibler (KL) divergence to the true data-generating distribution. Moreover, the VB posterior mean centers at the same value and is also asymptotically normal. These results generalize the variational Bernstein--von Mises theorem [29] to misspecified models. As a consequence of these results, we find that the model misspecification error dominates the variational approximation error in VB posterior predictive distributions. It explains the widely observed phenomenon that VB achieves comparable predictive accuracy with MCMC even though VB uses an approximating family. As illustrations, we study VB under three forms of model misspecification, ranging from model over-/under-dispersion to latent dimensionality misspecification. We conduct two simulation studies that demonstrate the theoretical results.

Montufar, Guido, Rauh, Johannes, Ay, Nihat

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.

In this paper, we present a new nonintrusive reduced basis method when a cheap low-fidelity model and expensive high-fidelity model are available. The method relies on proper orthogonal decomposition (POD) to generate the high-fidelity reduced basis and a shallow multilayer perceptron to learn the high-fidelity reduced coefficients. In contrast to other methods, one distinct feature of the proposed method is to incorporate the features extracted from the low-fidelity data as the input feature, this approach not only improves the predictive capability of the neural network but also enables the decoupling the high-fidelity simulation from the online stage. Due to its nonin-trusive nature, it is applicable to general parameterized problems. We also provide several numerical examples to illustrate the effectiveness and performance of the proposed method.

Montufar, Guido, Ay, Nihat, Ghazi-Zahedi, Keyan

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.

For Deep Learning & Machine Learning engineers out of all the probabilistic models in the world, Gaussian distribution model simply stands out. Even if you have never worked on an AI project, there is a significant chance that you have come across the Gaussian model. Gaussian distribution model, often identified with its iconic bell shaped curve, also referred as Normal distribution, is so popular mainly because of three reasons. All models are wrong but some are useful! Incredible number of processes in nature and social sciences naturally follows the Gaussian distribution.