Goto

Collaborating Authors

 Bayesian Learning


Noisy Natural Gradient as Variational Inference

arXiv.org Machine Learning

Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g.~fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo's predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient exploration in active learning, and intrinsic motivation for reinforcement learning.


Variational Recursive Dual Filtering

arXiv.org Machine Learning

State space models provide an interpretable framework for complex time series by combining an intuitive dynamical system model with a probabilistic observation model. We developed a flexible online learning framework for latent nonlinear state dynamics and filtered latent states. Our method utilizes the stochastic gradient variational Bayes method to jointly optimize the parameters of the nonlinear dynamics, observation model, and the black-box recognition model. Unlike previous approaches, our framework can incorporate non-trivial observation noise models and infer in real-time. We test our method on point process observations driven by continuous attractor dynamics, demonstrating its ability to recover the phase portrait, filtered trajectory, and produce long-term predictions for real-time machine learning.


ABC Samplers

arXiv.org Machine Learning

This Chapter, "ABC Samplers", is to appear in the forthcoming Handbook of Approximate Bayesian Computation (2018). It details the main ideas and algorithms used to sample from the ABC approximation to the posterior distribution, including methods based on rejection/importance sampling, MCMC and sequential Monte Carlo.


Bayesian shape modelling of cross-sectional geological data

arXiv.org Machine Learning

In particular, their cross-sectional shapes help determine their oil-bearing capacity. Current classification schemes for sand body shapes are qualitative, simple, and ad hoc, and so there is a need for a quantitative analysis with the help of statistical models. There are several problems of interest: estimation of shape class parameters given labelled data shapes (a'data shape' is an ordered set of points in R 2); classification of new data shapes; and unsupervised classification. Parameter estimation is described by the probability P(w y,c), where w denotes the shape class parameters andy the dataset, which consists of several data shapes, together with their class labelsc. By Bayes' theorem, this is given by: P(w y,c) P(y w,c) P(w).


A Unified View of Causal and Non-causal Feature Selection

arXiv.org Machine Learning

In this paper, we unify causal and non-causal feature selection methods based on the Bayesian network framework. We first show that the objectives of causal and non-causal feature selection methods are equal and are to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We demonstrate that causal and non-causal feature selection take different assumptions of dependency among features to find Markov blanket, and their algorithms are shown different level of approximation for finding Markov blanket. In this framework, we are able to analyze the sample and error bounds of casual and non-causal methods. We conducted extensive experiments to show the correctness of our theoretical analysis.


A Multi-Disciplinary Review of Knowledge Acquisition Methods: From Human to Autonomous Eliciting Agents

arXiv.org Artificial Intelligence

This paper offers a multi-disciplinary review of knowledge acquisition methods in human activity systems. The review captures the degree of involvement of various types of agencies in the knowledge acquisition process, and proposes a classification with three categories of methods: the human agent, the human-inspired agent, and the autonomous machine agent methods. In the first two categories, the acquisition of knowledge is seen as a cognitive task analysis exercise, while in the third category knowledge acquisition is treated as an autonomous knowledge-discovery endeavour. The motivation for this classification stems from the continuous change over time of the structure, meaning and purpose of human activity systems, which are seen as the factor that fuelled researchers' and practitioners' efforts in knowledge acquisition for more than a century. We show through this review that the KA field is increasingly active due to the higher and higher pace of change in human activity, and conclude by discussing the emergence of a fourth category of knowledge acquisition methods, which are based on red-teaming and co-evolution.


Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

arXiv.org Machine Learning

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posterior samples of the model. At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. Thus, it is attractive to consider approximate Bayesian neural networks in a Thompson Sampling framework. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems. We found that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario. In particular, we highlight the challenge of adapting slowly converging uncertainty estimates to the online setting.


Dynamic Bidding for Advance Commitments in Truckload Brokerage Markets

arXiv.org Machine Learning

Truckload brokerages, a $100 billion/year industry in the U.S., plays the critical role of matching shippers with carriers, often to move loads several days into the future. Brokerages not only have to find companies that will agree to move a load, the brokerage often has to find a price that both the shipper and carrier will agree to. The price not only varies by shipper and carrier, but also by the traffic lanes and other variables such as commodity type. Brokerages have to learn about shipper and carrier response functions by offering a price and observing whether each accepts the quote. We propose a knowledge gradient policy with bootstrap aggregation for high-dimensional contextual settings to guide price experimentation by maximizing the value of information. The learning policy is tested using a newly developed, carefully calibrated fleet simulator that includes a stochastic lookahead policy that simulates fleet movements, as well as the stochastic modeling of driver assignments and the carrier's load commitment policies with advance booking.


Generative Models of Visually Grounded Imagination

arXiv.org Machine Learning

It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good visual imagination, namely correctness, coverage, and compositionality (the 3 C's). Finally, we perform a detailed comparison of our method with two existing joint image-attribute VAE methods (the JMVAE method of Suzuki et al. (2017) and the BiVCCA method of Wang et al. (2016b)) by applying them to two datasets: the MNIST-with-attributes dataset (which we introduce here), and the CelebA dataset (Liu et al., 2015).


Machine Learning Trick of the Day (7): Density Ratio Trick

@machinelearnbot

A probability on its own is often an uninteresting thing. But when we can compare probabilities, that is when their full splendour is revealed. By comparing probabilities we are able form judgements; by comparing probabilities we can exploit the elements of our world that are probable; by comparing probabilities we can see the value of objects that are rare. In their own ways, all machine learning tricks help us make better probabilistic comparisons. Comparison is the theme of this post--not discussed in this series before--and the right start to this second sprint of machine learning tricks.