An exciting branch of machine learning research focuses on methods for learning, optimizing, and integrating unknown functions that are difficult or costly to evaluate. A popular Bayesian approach to this problem uses a Gaussian process (GP) to construct a posterior distribution over the function of interest given a set of observed measurements, and selects new points to evaluate using the statistics of this posterior. Here we extend these methods to exploit derivative information from the unknown function. We describe methods for Bayesian optimization (BO) and Bayesian quadrature (BQ) in settings where first and second derivatives may be evaluated along with the function itself. We perform sampling-based inference in order to incorporate uncertainty over hyperparameters, and show that both hyperparameter and function uncertainty decrease much more rapidly when using derivative information. Moreover, we introduce techniques for overcoming ill-conditioning issues that have plagued earlier methods for gradient-enhanced Gaussian processes and kriging. We illustrate the efficacy of these methods using applications to real and simulated Bayesian optimization and quadrature problems, and show that exploting derivatives can provide substantial gains over standard methods.
I have made some progress with my work on combining independent evidence using a Bayesian approach but eschewing standard Bayesian updating. I found a neat analytical way of doing this, to a very good approximation, in cases where each estimate of a parameter corresponds to the ratio of two variables each determined with normal error, the fractional uncertainty in the numerator and denominator variables differing between the types of evidence. This seems a not uncommon situation in science, and it is a good approximation to that which exists when estimating climate sensitivity. I have had a manuscript in which I develop and test this method accepted by the Journal of Statistical Planning and Inference (for a special issue on Confidence Distributions edited by Tore Schweder and Nils Hjort). Frequentist coverage is almost exact using my analytical solution, based on combining Jeffreys' priors in quadrature, whereas Bayesian updating produces far poorer probability matching.
Bayesian network (BN) inference has long been seen as a very important and hard problem in AI. To date researchers have developed many different kinds of exact and approximate BN inference algorithms. Each of these has different properties and works better for different classes of inference problems. Given a BN inference problem instance, it is usually hard but important to decide in advance which algorithm among a set of choices is the most appropriate. This problem is known as the algorithm selection problem [Ri76].
We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally not valid for statistical models that belong to a stratified exponential family. This stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct approximation for the marginal likelihood.