PAC-Bayesian Theory Meets Bayesian Inference

Neural Information Processing Systems

We exhibit a strong link between frequentist PAC-Bayesian bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d. distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.

Quantification of observed prior and likelihood information in parametric Bayesian modeling Machine Learning

Two data-dependent information metrics are developed to quantify the information of the prior and likelihood functions within a parametric Bayesian model, one of which is closely related to the reference priors from Berger, Bernardo, and Sun, and information measure introduced by Lindley. A combination of theoretical, empirical, and computational support provides evidence that these information-theoretic metrics may be useful diagnostic tools when performing a Bayesian analysis.

Automated Analytic Asymptotic Evaluation of the Marginal Likelihood for Latent Models Machine Learning

We present and implement two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard for latent Bayesian network models, namely networks that include hidden variables, where asymptotic approximation deviates from the standard BIC score. Our algorithms solve two central difficulties in asymptotic evaluation of marginal likelihood integrals, namely, evaluation of regular dimensionality drop for latent Bayesian network models and computation of non-standard approximation formulas for singular statistics for these models. The presented algorithms are implemented in Matlab and Maple and their usage is demonstrated for marginal likelihood approximations for Bayesian networks with hidden variables.

Robust Bayesian Optimization with Student-t Likelihood Machine Learning

Bayesian optimization has recently attracted the attention of the automatic machine learning community for its excellent results in hyperparameter tuning. BO is characterized by the sample efficiency with which it can optimize expensive black-box functions. The efficiency is achieved in a similar fashion to the learning to learn methods: surrogate models (typically in the form of Gaussian processes) learn the target function and perform intelligent sampling. This surrogate model can be applied even in the presence of noise; however, as with most regression methods, it is very sensitive to outlier data. This can result in erroneous predictions and, in the case of BO, biased and inefficient exploration. In this work, we present a GP model that is robust to outliers which uses a Student-t likelihood to segregate outliers and robustly conduct Bayesian optimization. We present numerical results evaluating the proposed method in both artificial functions and real problems.

Variational Bayesian Monte Carlo

Neural Information Processing Systems

Many probabilistic models of interest in scientific computing and machine learning have expensive, black-box likelihoods that prevent the application of standard techniques for Bayesian inference, such as MCMC, which would require access to the gradient or a large number of likelihood evaluations. We introduce here a novel sample-efficient inference framework, Variational Bayesian Monte Carlo (VBMC). VBMC combines variational inference with Gaussian-process based, active-sampling Bayesian quadrature, using the latter to efficiently approximate the intractable integral in the variational objective. Our method produces both a nonparametric approximation of the posterior distribution and an approximate lower bound of the model evidence, useful for model selection. We demonstrate VBMC both on several synthetic likelihoods and on a neuronal model with data from real neurons. Across all tested problems and dimensions (up to D = 10), VBMC performs consistently well in reconstructing the posterior and the model evidence with a limited budget of likelihood evaluations, unlike other methods that work only in very low dimensions. Our framework shows great promise as a novel tool for posterior and model inference with expensive, black-box likelihoods.