Uncertainty
The Population Posterior and Bayesian Inference on Streams
McInerney, James, Ranganath, Rajesh, Blei, David M.
Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which assume that we condition on finite data. We develop population variational Bayes, a new approach for using Bayesian modeling to analyze streams of data. It approximates a new type of distribution, the population posterior, which combines the notion of a population distribution of the data with Bayesian inference in a probabilistic model. We study our method with latent Dirichlet allocation and Dirichlet process mixtures on several large-scale data sets.
Fast Adaptive Weight Noise
Bayer, Justin, Karl, Maximilian, Korhammer, Daniela, van der Smagt, Patrick
Marginalising out uncertain quantities within the internal representations or parameters of neural networks is of central importance for a wide range of learning techniques, such as empirical, variational or full Bayesian methods. We set out to generalise fast dropout (Wang & Manning, 2013) to cover a wider variety of noise processes in neural networks. This leads to an efficient calculation of the marginal likelihood and predictive distribution which evades sampling and the consequential increase in training time due to highly variant gradient estimates. This allows us to approximate variational Bayes for the parameters of feed-forward neural networks. Inspired by the minimum description length principle, we also propose and experimentally verify the direct optimisation of the regularised predictive distribution. The methods yield results competitive with previous neural network based approaches and Gaussian processes on a wide range of regression tasks.
The Mondrian Process for Machine Learning
This report is concerned with the Mondrian process and its applications in machine learning. The Mondrian process is a guillotine-partition-valued stochastic process that possesses an elegant self-consistency property. The first part of the report uses simple concepts from applied probability to define the Mondrian process and explore its properties. The Mondrian process has been used as the main building block of a clever online random forest classification algorithm that turns out to be equivalent to its batch counterpart. We outline a slight adaptation of this algorithm to regression, as the remainder of the report uses regression as a case study of how Mondrian processes can be utilized in machine learning. In particular, the Mondrian process will be used to construct a fast approximation to the computationally expensive kernel ridge regression problem with a Laplace kernel. The complexity of random guillotine partitions generated by a Mondrian process and hence the complexity of the resulting regression models is controlled by a lifetime hyperparameter. It turns out that these models can be efficiently trained and evaluated for all lifetimes in a given range at once, without needing to retrain them from scratch for each lifetime value. This leads to an efficient procedure for determining the right model complexity for a dataset at hand. The limitation of having a single lifetime hyperparameter will motivate the final Mondrian grid model, in which each input dimension is endowed with its own lifetime parameter. In this model we preserve the property that its hyperparameters can be tweaked without needing to retrain the modified model from scratch.
Classification with Noisy Labels by Importance Reweighting
In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability $\rho\in[0,0.5)$, and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate $\rho$. We show that the rate is upper bounded by the conditional probability $P(y|x)$ of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
Type I and Type II Bayesian Methods for Sparse Signal Recovery using Scale Mixtures
In this paper, we propose a generalized scale mixture family of distributions, namely the Power Exponential Scale Mixture (PESM) family, to model the sparsity inducing priors currently in use for sparse signal recovery (SSR). We show that the successful and popular methods such as LASSO, Reweighted $\ell_1$ and Reweighted $\ell_2$ methods can be formulated in an unified manner in a maximum a posteriori (MAP) or Type I Bayesian framework using an appropriate member of the PESM family as the sparsity inducing prior. In addition, exploiting the natural hierarchical framework induced by the PESM family, we utilize these priors in a Type II framework and develop the corresponding EM based estimation algorithms. Some insight into the differences between Type I and Type II methods is provided and of particular interest in the algorithmic development is the Type II variant of the popular and successful reweighted $\ell_1$ method. Extensive empirical results are provided and they show that the Type II methods exhibit better support recovery than the corresponding Type I methods.
Fast Approximate Bayesian Computation for Estimating Parameters in Differential Equations
Ghosh, Sanmitra, Dasmahapatra, Srinandan, Maharatna, Koushik
Approximate Bayesian computation (ABC) using a sequential Monte Carlo method provides a comprehensive platform for parameter estimation, model selection and sensitivity analysis in differential equations. However, this method, like other Monte Carlo methods, incurs a significant computational cost as it requires explicit numerical integration of differential equations to carry out inference. In this paper we propose a novel method for circumventing the requirement of explicit integration by using derivatives of Gaussian processes to smooth the observations from which parameters are estimated. We evaluate our methods using synthetic data generated from model biological systems described by ordinary and delay differential equations. Upon comparing the performance of our method to existing ABC techniques, we demonstrate that it produces comparably reliable parameter estimates at a significantly reduced execution time.
The Angry Birds AI Competition
Renz, Jochen (The Australian National University) | Ge, Xiaoyu (The Australian National University) | Gould, Stephen (The Australian National University) | Zhang, Peng (The Australian National University)
The aim of the Angry Birds AI competition (AIBIRDS) is to build intelligent agents that can play new Angry Birds levels better than the best human players. This is surprisingly difficult for AI as it requires similar capabilities to what intelligent systems need for successfully interacting with the physical world, one of the grand challenges of AI. As such the competition offers a simplified and controlled environment for developing and testing the necessary AI technologies, a seamless integration of computer vision, machine learning, knowledge representation and reasoning, reasoning under uncertainty, planning, and heuristic search, among others. Over the past three years there have been significant improvements, but we are still a long way from reaching the ultimate aim and, thus, there are great opportunities for participants in this competition.
The Angry Birds AI Competition
Renz, Jochen (The Australian National University) | Ge, Xiaoyu (The Australian National University) | Gould, Stephen (The Australian National University) | Zhang, Peng (The Australian National University)
The aim of the Angry Birds AI competition (AIBIRDS) is to build intelligent agents that can play new Angry Birds levels better than the best human players. This is surprisingly difficult for AI as it requires similar capabilities to what intelligent systems need for successfully interacting with the physical world, one of the grand challenges of AI. As such the competition offers a simplified and controlled environment for developing and testing the necessary AI technologies, a seamless integration of computer vision, machine learning, knowledge representation and reasoning, reasoning under uncertainty, planning, and heuristic search, among others. Over the past three years there have been significant improvements, but we are still a long way from reaching the ultimate aim and, thus, there are great opportunities for participants in this competition.
Scalable Gaussian Process Classification via Expectation Propagation
Hernández-Lobato, Daniel, Hernández-Lobato, José Miguel
Variational methods have been recently considered for scaling the training process of Gaussian process classifiers to large datasets. As an alternative, we describe here how to train these classifiers efficiently using expectation propagation. The proposed method allows for handling datasets with millions of data instances. More precisely, it can be used for (i) training in a distributed fashion where the data instances are sent to different nodes in which the required computations are carried out, and for (ii) maximizing an estimate of the marginal likelihood using a stochastic approximation of the gradient. Several experiments indicate that the method described is competitive with the variational approach.
On the Convergence of Stochastic Variational Inference in Bayesian Networks
We highlight a pitfall when applying stochastic variational inference to general Bayesian networks. For global random variables approximated by an exponential family distribution, natural gradient steps, commonly starting from a unit length step size, are averaged to convergence. This useful insight into the scaling of initial step sizes is lost when the approximation factorizes across a general Bayesian network, and care must be taken to ensure practical convergence. We experimentally investigate how much of the baby (well-scaled steps) is thrown out with the bath water (exact gradients).