bnn
Binarized Neural Networks
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At train-time the binary weights and activations are used for computing the parameter gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs, we conducted two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. We also report our preliminary results on the challenging ImageNet dataset. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.
Dirichlet Scale Mixture Priors for Bayesian Neural Networks
Arnstad, August, Rønneberg, Leiv, Storvik, Geir
Neural networks are the cornerstone of modern machine learning, yet can be difficult to interpret, give overconfident predictions and are vulnerable to adversarial attacks. Bayesian neural networks (BNNs) provide some alleviation of these limitations, but have problems of their own. The key step of specifying prior distributions in BNNs is no trivial task, yet is often skipped out of convenience. In this work, we propose a new class of prior distributions for BNNs, the Dirichlet scale mixture (DSM) prior, that addresses current limitations in Bayesian neural networks through structured, sparsity-inducing shrinkage. Theoretically, we derive general dependence structures and shrinkage results for DSM priors and show how they manifest under the geometry induced by neural networks. In experiments on simulated and real world data we find that the DSM priors encourages sparse networks through implicit feature selection, show robustness under adversarial attacks and deliver competitive predictive performance with substantially fewer effective parameters. In particular, their advantages appear most pronounced in correlated, moderately small data regimes, and are more amenable to weight pruning. Moreover, by adopting heavy-tailed shrinkage mechanisms, our approach aligns with recent findings that such priors can mitigate the cold posterior effect, offering a principled alternative to the commonly used Gaussian priors.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (4 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > South Australia > Adelaide (0.04)
- Europe > United Kingdom > England > Surrey (0.04)
- Asia > Vietnam (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Implicit Variational Inference for High-Dimensional Posteriors
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors in high-dimensional spaces.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
PosteriorRefinementImprovesSampleEfficiency inBayesianNeuralNetworks
Due to the non-linearity of NNs, no analytic solution to the integral exists, even when the likelihood and the approximate posterior are both Gaussian. A low-cost, unbiased, stochastic approximation can be obtained via Monte Carlo (MC) integration: obtainS samples from the approximate posterior and then compute the empirical expectation of the likelihood w.r.t.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > Panama (0.04)
- North America > United States > New York > New York County > New York City (0.15)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
HierarchicalGaussianProcessPriorsforBayesian NeuralNetworkWeights
Variational inference was employed in prior work to inferz (and w implicitly), and to obtain a point estimate ofθ, as a by-product of optimising the variational lower bound. Critically, in this representation weights are only implicitly parametrized through the use of these latent variables, which transforms inference onweights into inference ofthemuch smaller collection oflatent unit variables.