Variational approaches to approximate Bayesian inference provide very efficient means of performing parameter estimation and model selection. Among these, so-called variational-Laplace or VL schemes rely on Gaussian approximations to posterior densities on model parameters. In this note, we review the main variants of VL approaches, that follow from considering nonlinear models of continuous and/or categorical data. En passant, we also derive a few novel theoretical results that complete the portfolio of existing analyses of variational Bayesian approaches, including investigations of their asymptotic convergence. We also suggest practical ways of extending existing VL approaches to hierarchical generative models that include (e.g., precision) hyperparameters.

So-called sparse estimators arise in the context of model fitting, when one a priori assumes that only a few (unknown) model parameters deviate from zero. Sparsity constraints can be useful when the estimation problem is under-determined, i.e. when number of model parameters is much higher than the number of data points. Typically, such constraints are enforced by minimizing the L1 norm, which yields the so-called LASSO estimator. In this work, we propose a simple parameter transform that emulates sparse priors without sacrificing the simplicity and robustness of L2-norm regularization schemes. We show how L1 regularization can be obtained with a "sparsify" remapping of parameters under normal Bayesian priors, and we demonstrate the ensuing variational Laplace approach using Monte-Carlo simulations.

Artificial neural networks (NNs) have become the de facto standard in machine learning. They allow learning highly nonlinear transformations in a plethora of applications. However, NNs usually only provide point estimates without systematically quantifying corresponding uncertainties. In this paper a novel approach towards fully Bayesian NNs is proposed, where training and predictions of a perceptron are performed within the Bayesian inference framework in closed-form. The weights and the predictions of the perceptron are considered Gaussian random variables. Analytical expressions for predicting the perceptron's output and for learning the weights are provided for commonly used activation functions like sigmoid or ReLU. This approach requires no computationally expensive gradient calculations and further allows sequential learning.

This note is concerned with an accurate and computationally efficient variational bayesian treatment of mixed-effects modelling. We focus on group studies, i.e. empirical studies that report multiple measurements acquired in multiple subjects. When approached from a bayesian perspective, such mixed-effects models typically rely upon a hierarchical generative model of the data, whereby both within- and between-subject effects contribute to the overall observed variance. The ensuing VB scheme can be used to assess statistical significance at the group level and/or to capture inter-individual differences. Alternatively, it can be seen as an adaptive regularization procedure, which iteratively learns the corresponding within-subject priors from estimates of the group distribution of effects of interest (cf. so-called "empirical bayes" approaches). We outline the mathematical derivation of the ensuing VB scheme, whose open-source implementation is available as part the VBA toolbox.

Donner, Christian, Opper, Manfred

We present an approximate Bayesian inference approach for estimating the intensity of a inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and P\'olya--Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior. We approximate the posterior using a free--form mean field approximation together with the framework of sparse GPs. Furthermore, as alternative approximation we suggest a sparse Laplace approximation of the posterior, for which an efficient expectation--maximisation algorithm is derived to find the posterior's mode. Results of both algorithms compare well with exact inference obtained by a Markov Chain Monte Carlo sampler and standard variational Gauss approach, while being one order of magnitude faster.