Bayesian Neural Network Priors Revisited
Fortuin, Vincent, Garriga-Alonso, Adrià, Wenzel, Florian, Rätsch, Gunnar, Turner, Richard, van der Wilk, Mark, Aitchison, Laurence
In a Bayesian neural network (BNN), we specify a prior p(w) over the neural network parameters, and compute the posterior distribution over parameters conditioned on training data, p(w x, y) p(y w, x)p(w)/p(y x). This procedure should give considerable advantages for reasoning about predictive uncertainty, which is especially relevant in the small-data setting. Crucially, to perform Bayesian inference, we need to choose a prior that accurately reflects our beliefs about the parameters before seeing any data (Bayes, 1763; Gelman et al., 2013). However, the most common choice of the prior for BNN weights is the simplest one: the isotropic Gaussian. Isotropic Gaussians are used across almost all fields of Bayesian deep learning, ranging from variational inference (Blundell et al., 2015; Dusenberry et al., 2020), to sampling-based inference (Zhang et al., 2019), and even to infinite networks (Lee et al., 2017; Garriga-Alonso et al., 2019). This is troubling, since isotropic Gaussian priors are almost certainly not the best choice. Indeed, despite the progress on more accurate and efficient inference procedures, in most settings, the posterior predictive of BNNs using a Gaussian prior still leads to worse predictive performance than a baseline obtained by training the network with standard stochastic gradient descent (SGD) (e.g., Zhang et al., 2019; Heek & Kalchbrenner, 2019; Wenzel et al., 2020a). However, it has been shown that the performance of BNNs can be improved by artificially reducing posterior uncertainty using "cold posteriors" (Wenzel et al., 2020a).
Feb-12-2021
- Country:
- Europe (0.14)
- North America > Canada
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine (0.46)