The article was written by Amber Zhou, a Financial Analyst at I Know First. Deep learning has become a buzzward in recent years. In fact, it has once gained much attention and excitements under the name neural networks early back in 1980's. However due to the lack of sufficient compute power and training examples, it gradually experienced a depression in the following decade. As we are entering the Era of Big Data in light of the explosion of computer power, deep learning has recently seen a revival.
Variational inference for Bayesian deep neural networks (DNNs) requires specifying priors and approximate posterior distributions for neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose Bayesian MOdel Priors Extracted from Deterministic DNN (MOPED) method for stochastic variational inference to choose meaningful prior distributions over weight space using deterministic weights derived from the pretrained DNNs of equivalent architecture. We evaluate the proposed approach on multiple datasets and real-world application domains with a range of varying complex model architectures to demonstrate MOPED enables scalable variational inference for Bayesian DNNs. The proposed method achieves faster training convergence and provides reliable uncertainty quantification, without compromising on the accuracy provided by the deterministic DNNs. We also propose hybrid architectures to Bayesian DNNs where deterministic and variational layers are combined to balance computation complexity during prediction phase and while providing benefits of Bayesian inference. We will release the source code for this work.
We present a novel approach for training deep neural networks in a Bayesian way. Classical, i.e. non-Bayesian, deep learning has two major drawbacks both originating from the fact that network parameters are considered to be deterministic. First, model uncertainty cannot be measured thus limiting the use of deep learning in many fields of application and second, training of deep neural networks is often hampered by overfitting. The proposed approach uses variational inference to approximate the intractable a posteriori distribution on basis of a normal prior. The variational density is designed in such a way that the a posteriori uncertainty of the network parameters is represented per network layer and depending on the estimated parameter expectation values. This way, only a few additional parameters need to be optimized compared to a non-Bayesian network. We apply this Bayesian approach to train and test the LeNet architecture on the MNIST dataset. Compared to classical deep learning, the test error is reduced by 15%. In addition, the trained model contains information about the parameter uncertainty in each layer. We show that this information can be used to calculate credible intervals for the prediction and to optimize the network architecture for a given training data set.
Deep learning provides a flexible framework for function approximation and, as a result, deep models have become a standard approach in many domains including machine vision, natural language processing, speech recognition, bioinformatics, and game-playing [LeCun et al., 2015]. However, deep models tend to overfit when the number of training examples is small; furthermore, in practice, the primary focus in deep learning is often on computing point estimates of model parameters, and thus these models do not provide uncertainties for their predictions - making them unsuitable for applications in critical domains such as personalized medicine. Bayesian neural networks (BNN) promise to address these issues by modeling the uncertainty in the network weights, and correspondingly, the uncertainty in output predictions[MacKay, 1992b, Neal, 2012]. Unfortunately, characterizing uncertainty over parameters of modern neural networks in a Bayesian setting is challenging due to the high-dimensionality of the weight space and complex patterns of dependencies among the weights. In these cases, Markov-chain Monte Carlo (MCMC) techniques for performing inference often fail to mix across the weight space, and standard variational approaches not only struggle to escape local optima, but also fail to capture dependencies between the weights. A recent body of work has attempted to improve the quality of inference for Bayesian neural networks (BNNs) via improved approximate inference methods [Graves, 2011, Blundell et al., 2015, Hernández-Lobato et al., 2016], or by improving the flexibility of the variational approximation for variational inference [Gershman et al., 2012, Ranganath et al., 2016, Louizos and Welling, 2017]. In this work, we introduce a novel approach in which we remove potential redundancies in neural network parameters by learning a nonlinear projection of the weights onto a low-dimensional latent space. Our approach takes advantage of the following insight: learning (standard network) parameters is easier in the high-dimensional space, but characterizing (Bayesian) uncertainty is easier in the 1 low-dimensional space. Low-dimensional spaces are generally easier to explore, especially if we have fewer correlations between dimensions, and can be better captured by standard variational approximations (e.g.
The main inspiration for this blog post is based on the work I did on Bayesian Neural Networks with my friend Brian Trippe at the Computational and Biological Learning Lab in Cambridge University. I highly recommend anyone to read Brian's thesis on variational inference in neural networks. Disclaimer: At the Computational and Biological Learning Lab Bayesian machine learning techniques are unapologetically taught as the way forward. As such, be aware of potential bias in this blog post. For example in image classification, x represents an image and y the corresponding image label.