Exploiting gradients and Hessians in Bayesian optimization and Bayesian quadrature

arXiv.org Machine Learning

An exciting branch of machine learning research focuses on methods for learning, optimizing, and integrating unknown functions that are difficult or costly to evaluate. A popular Bayesian approach to this problem uses a Gaussian process (GP) to construct a posterior distribution over the function of interest given a set of observed measurements, and selects new points to evaluate using the statistics of this posterior. Here we extend these methods to exploit derivative information from the unknown function. We describe methods for Bayesian optimization (BO) and Bayesian quadrature (BQ) in settings where first and second derivatives may be evaluated along with the function itself. We perform sampling-based inference in order to incorporate uncertainty over hyperparameters, and show that both hyperparameter and function uncertainty decrease much more rapidly when using derivative information. Moreover, we introduce techniques for overcoming ill-conditioning issues that have plagued earlier methods for gradient-enhanced Gaussian processes and kriging. We illustrate the efficacy of these methods using applications to real and simulated Bayesian optimization and quadrature problems, and show that exploting derivatives can provide substantial gains over standard methods.

Combining independent evidence using a Bayesian approach but without standard Bayesian updating?


I have made some progress with my work on combining independent evidence using a Bayesian approach but eschewing standard Bayesian updating. I found a neat analytical way of doing this, to a very good approximation, in cases where each estimate of a parameter corresponds to the ratio of two variables each determined with normal error, the fractional uncertainty in the numerator and denominator variables differing between the types of evidence. This seems a not uncommon situation in science, and it is a good approximation to that which exists when estimating climate sensitivity. I have had a manuscript in which I develop and test this method accepted by the Journal of Statistical Planning and Inference (for a special issue on Confidence Distributions edited by Tore Schweder and Nils Hjort). Frequentist coverage is almost exact using my analytical solution, based on combining Jeffreys' priors in quadrature, whereas Bayesian updating produces far poorer probability matching.

A Bayesian Metareasoner for Algorithm Selection for Real-time Bayesian Network Inference Problems

AAAI Conferences

Bayesian network (BN) inference has long been seen as a very important and hard problem in AI. To date researchers have developed many different kinds of exact and approximate BN inference algorithms. Each of these has different properties and works better for different classes of inference problems. Given a BN inference problem instance, it is usually hard but important to decide in advance which algorithm among a set of choices is the most appropriate. This problem is known as the algorithm selection problem [Ri76].

Bayesian Basics, Explained


Editor's note: The following is an interview with Columbia University Professor Andrew Gelman conducted by Marketing scientist Kevin Gray, in which Gelman spells out the ABCs of Bayesian statistics. Kevin Gray: Most marketing researchers have heard of Bayesian statistics but know little about it. Can you briefly explain in layperson's terms what it is and how it differs from the'ordinary' statistics most of us learned in college? Andrew Gelman: Bayesian statistics uses the mathematical rules of probability to combines data with "prior information" to give inferences which (if the model being used is correct) are more precise than would be obtained by either source of information alone. Classical statistical methods avoid prior distributions.