Bayesian Hypernetworks
Krueger, David, Huang, Chin-Wei, Islam, Riashat, Turner, Ryan, Lacoste, Alexandre, Courville, Aaron
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, $h$, is a neural network which learns to transform a simple noise distribution, $p(\epsilon) = \mathcal{N}(0,I)$, to a distribution $q(\theta) \doteq q(h(\epsilon))$ over the parameters $\theta$ of another neural network (the "primary network"). We train $q$ with variational inference, using an invertible $h$ to enable efficient estimation of the variational lower bound on the posterior $p(\theta | \mathcal{D})$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of $q(\theta)$. We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.
Oct-12-2017