Combining independent evidence using a Bayesian approach but without standard Bayesian updating?


I have made some progress with my work on combining independent evidence using a Bayesian approach but eschewing standard Bayesian updating. I found a neat analytical way of doing this, to a very good approximation, in cases where each estimate of a parameter corresponds to the ratio of two variables each determined with normal error, the fractional uncertainty in the numerator and denominator variables differing between the types of evidence. This seems a not uncommon situation in science, and it is a good approximation to that which exists when estimating climate sensitivity. I have had a manuscript in which I develop and test this method accepted by the Journal of Statistical Planning and Inference (for a special issue on Confidence Distributions edited by Tore Schweder and Nils Hjort). Frequentist coverage is almost exact using my analytical solution, based on combining Jeffreys' priors in quadrature, whereas Bayesian updating produces far poorer probability matching.

Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature Machine Learning

An exciting branch of machine learning research focuses on methods for learning, optimizing, and integrating unknown functions that are difficult or costly to evaluate. A popular Bayesian approach to this problem uses a Gaussian process (GP) to construct a posterior distribution over the function of interest given a set of observed measurements, and selects new points to evaluate using the statistics of this posterior. Here we extend these methods to exploit derivative information from the unknown function. We describe methods for Bayesian optimization (BO) and Bayesian quadrature (BQ) in settings where first and second derivatives may be evaluated along with the function itself. We perform sampling-based inference in order to incorporate uncertainty over hyperparameters, and show that both hyperparameter and function uncertainty decrease much more rapidly when using derivative information. Moreover, we introduce techniques for overcoming ill-conditioning issues that have plagued earlier methods for gradient-enhanced Gaussian processes and kriging. We illustrate the efficacy of these methods using applications to real and simulated Bayesian optimization and quadrature problems, and show that exploting derivatives can provide substantial gains over standard methods.

Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

Neural Information Processing Systems

In this paper we obtain convergence bounds for the concentration of Bayesian posterior distributions (around the true distribution) using a novel method that simplifies and enhances previous results. Based on the analysis, we also introduce a generalized family of Bayesian posteriors, and show that the convergence behavior of these generalized posteriors is completely determined by the local prior structure around the true distribution. Thisimportant and surprising robustness property does not hold for the standard Bayesian posterior in that it may not concentrate when there exist "bad" prior structures even at places far away from the true distribution.

A Bayesian Metareasoner for Algorithm Selection for Real-time Bayesian Network Inference Problems

AAAI Conferences

Bayesian network (BN) inference has long been seen as a very important and hard problem in AI. To date researchers have developed many different kinds of exact and approximate BN inference algorithms. Each of these has different properties and works better for different classes of inference problems. Given a BN inference problem instance, it is usually hard but important to decide in advance which algorithm among a set of choices is the most appropriate. This problem is known as the algorithm selection problem [Ri76].