Pawlowski, Nick, Jaques, Miguel, Glocker, Ben

In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.

Pearce, Tim, Zaki, Mohamed, Neely, Andy

Ensembles of neural networks (NNs) have long been used to estimate predictive uncertainty (Tibshirani, 1996;Heskes, 1996); a small number of NNs are trained from different initialisations and sometimes on differing versions of the dataset. The variance of the ensemble's predictions is interpreted asits epistemic uncertainty. The appeal of ensembling stems from being a collection of regular NNs - this makes them both scalable and easily implementable. NN ensembles have continued to achieve strong empirical results in recent years, for example in Lakshminarayanan et al. (2017), where it was presented as a practical alternative to more costly Bayesian NNs (BNNs). The departure from Bayesian methodology is of concern since the Bayesian framework provides a principled, widely-accepted approach to handling uncertainty. Several recent works have explored links between ensembles and Bayesian inference. Variants of an ensembling scheme known to be consistent for Bayesian linear regression have been applied directly to NNs (Lu and Van Roy, 2017; Osband et al., 2017). In this extended abstract we derive and implement a modified ensembling scheme specifically for NNs, which provides a consistent estimator of the Bayesian posterior in wide NNs - regularising parameters about values drawn from a prior distribution.

Rockova, Veronika, Saha, Enakshi

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying the foundations for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks. We conclude with a result stating the optimal rate of posterior convergence for BART.

In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations.

Yao, Jiayu, Pan, Weiwei, Ghosh, Soumya, Doshi-Velez, Finale

Bayesian Neural Networks (BNNs) place priors There exists a large body of work to improve the quality of over the parameters in a neural network. Inference inference for Bayesian neural networks (BNNs) by improving in BNNs, however, is difficult; all inference the approximate inference procedure (e.g. Graves 2011; methods for BNNs are approximate. In this work, Blundell et al. 2015; Hernández-Lobato et al. 2016, to name we empirically compare the quality of predictive a few), or by improving the flexibility of the variational uncertainty estimates for 10 common inference approximation for variational inference (e.g.