mackay
HierarchicalGaussianProcessPriorsforBayesian NeuralNetworkWeights
Variational inference was employed in prior work to inferz (and w implicitly), and to obtain a point estimate ofθ, as a by-product of optimising the variational lower bound. Critically, in this representation weights are only implicitly parametrized through the use of these latent variables, which transforms inference onweights into inference ofthemuch smaller collection oflatent unit variables.
Bayesian Deep ICE
Datta, Jyotishka, Polson, Nicholas G.
Deep Independent Component Estimation (DICE) has many applications in modern day machine learning as a feature engineering extraction method. We provide a novel latent variable representation of independent component analysis that enables both point estimates via expectation-maximization (EM) and full posterior sampling via Markov Chain Monte Carlo (MCMC) algorithms. Our methodology also applies to flow-based methods for nonlinear feature extraction. We discuss how to implement conditional posteriors and envelope-based methods for optimization. Through this representation hierarchy, we unify a number of hitherto disjoint estimation procedures. We illustrate our methodology and algorithms on a numerical example. Finally, we conclude with directions for future research.
New Mathematical Formula Unveiled to Prevent AI From Making Unethical Decisions
Researchers from the UK and Switzerland have found a mathematical means of helping regulators and business police Artificial Intelligence systems' biases towards making unethical, and potentially very costly and damaging choices. The collaborators from the University of Warwick, Imperial College London, and EPFL – Lausanne, along with the strategy firm Sciteb Ltd, believe that in an environment in which decisions are increasingly made without human intervention, there is a very strong incentive to know under what circumstances AI systems might adopt an unethical strategy--and to find and reduce that risk, or eliminate entirely, if possible. Artificial intelligence (AI) is increasingly deployed in commercial situations. Consider for example using AI to set prices of insurance products to be sold to a particular customer. There are legitimate reasons for setting different prices for different people, but it may also be more profitable to make certain decisions that end up hurting the company.
Fast Predictive Uncertainty for Classification with Bayesian Deep Networks
Hobbhahn, Marius, Kristiadi, Agustinus, Hennig, Philipp
In Bayesian Deep Learning, distributions over the output of classification neural networks are approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the categorical output distribution. This is costly. We reconsider old work to construct a Dirichlet approximation of this output distribution, which yields an analytic map between Gaussian distributions in logit space and Dirichlet distributions (the conjugate prior to the categorical) in the output space. We argue that the resulting Dirichlet distribution has theoretical and practical advantages, in particular more efficient computation of the uncertainty estimate, scaling to large datasets and networks like ImageNet and DenseNet. We demonstrate the use of this Dirichlet approximation by using it to construct a lightweight uncertainty-aware output ranking for the ImageNet setup.
Artificial intelligence will be big, so prepare
It seems like artificial intelligence is taking over the world, leaving many of us non-techies feeling terrified. Yet when you stop to think about it, we all use artificial intelligence (AI) every day. When we Google something, use Siri on our smartphones or ask Alexa a question, we are using AI. Hollywood has certainly featured AI in many movies from "The Terminator" series to "Robocop" and "I, Robot." In "Minority Report," algorithms predict who is going to commit a crime, and the person is arrested before the crime can be committed.
Fixing Variational Bayes: Deterministic Variational Inference for Bayesian Neural Networks
Wu, Anqi, Nowozin, Sebastian, Meeds, Edward, Turner, Richard E., Hernández-Lobato, José Miguel, Gaunt, Alexander L.
Bayesian neural networks (BNNs) hold great promise as a flexible and principled solution to deal with uncertainty when learning from finite data. Among approaches to realize probabilistic inference in deep neural networks, variational Bayes (VB) is theoretically grounded, generally applicable, and computationally efficient. With wide recognition of potential advantages, why is it that variational Bayes has seen very limited practical use for BNNs in real applications? We argue that variational inference in neural networks is fragile: successful implementations require careful initialization and tuning of prior variances, as well as controlling the variance of Monte Carlo gradient estimates. We fix VB and turn it into a robust inference tool for Bayesian neural networks. We achieve this with two innovations: first, we introduce a novel deterministic method to approximate moments in neural networks, eliminating gradient variance; second, we introduce a hierarchical prior for parameters and a novel empirical Bayes procedure for automatically selecting prior variances. Combining these two innovations, the resulting method is highly efficient and robust. On the application of heteroscedastic regression we demonstrate strong predictive performance over alternative approaches.
A Practical Approach to Sizing Neural Networks
Friedland, Gerald, Metere, Alfredo, Krell, Mario
Based on MacKay's information theoretic model of supervised machine learning [23], this article discusses how to practically estimate the maximum size of a neural network given a training data set. First, we present four easily applicable rules to analytically determine the capacity of neural network architectures. This allows the comparison of the efficiency of different network architectures independently of a task. Second, we introduce and experimentally validate a heuristic method to estimate the neural network capacity requirement for a given dataset and labeling. This allows an estimate of the required size of a neural network for a given problem. We conclude the article with a discussion on the consequences of sizing the network wrongly, which includes both increased computation effort for training as well as reduced generalization capability.
For World's Newest Scrabble Stars, SHORT Tops SHORTER
LAGOS--Nigeria is beating the West at its own word game, using a strategy that sounds like Scrabble sacrilege. By relentlessly studying short words, this country of 500 languages has risen to dominate English's top lexical contest. Last November, for the final of Scrabble's 32-round World Championship in Australia, Nigeria's winningest wordsmith, Wellington Jighere, defeated Britain's Lewis Mackay, in a victory that led morning news broadcasts in his homeland half a world away. It was the crowning achievement for a nation that boasts more top-200 Scrabble players than any other country, including the U.K., Nigeria's former colonizer and one of the board game's legacy powers. "In other countries they see it as a game," said Mr. Jighere, now a borderline celebrity and talent scout for one of the world's few government-backed national programs.
Ensemble Learning for Multi-Layer Networks
Barber, David, Bishop, Christopher M.
In contrast to the maximum likelihood approach which finds only a single estimate for the regression parameters, the Bayesian approach yields a distribution of weight parameters, p(wID), conditional on the training data D, and predictions are ex- ·Present address: SNN, University of Nijmegen, Geert Grooteplein 21, Nijmegen, The Netherlands.