Uncertainty
Limbo: A Fast and Flexible Library for Bayesian Optimization
Cully, Antoine, Chatzilygeroudis, Konstantinos, Allocati, Federico, Mouret, Jean-Baptiste
Bayesian Optimization (BO) is designed for the most challenging ones: when the gradient is unknown, evaluating a solution is costly, and evaluations are noisy. This is, for instance, the case when we want to find optimal parameters for a machine learning algorithm [Snoek et al., 2012], because testing a set of parameters can take hours, and because of the stochastic nature of many machine learning algorithms. Besides parameter tuning, Bayesian optimization recently attracted a lot of interest for direct policy search in robot learning [Lizotte et al., 2007, Wilson et al., 2014, Calandra et al., 2016] and online adaptation; for example, it was recently used to allow a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes) [Cully et al., 2015]. At its core, Bayesian optimization builds a probabilistic model of the function to be optimized (the reward/performance/cost function) using the samples that have already been evaluated [Shahriari et al., 2016]; usually, this model is a Gaussian process [Williams and Rasmussen, 2006]. To select the next sample to be evaluated, Bayesian optimization optimizes an acquisition function which leverages the model to predict the most promising areas of the search space.
Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models
We consider the problem of estimating the expected value of information (the knowledge gradient) for Bayesian learning problems where the belief model is nonlinear in the parameters. Our goal is to maximize some metric, while simultaneously learning the unknown parameters of the nonlinear belief model, by guiding a sequential experimentation process which is expensive. We overcome the problem of computing the expected value of an experiment, which is computationally intractable, by using a sampled approximation, which helps to guide experiments but does not provide an accurate estimate of the unknown parameters. We then introduce a resampling process which allows the sampled model to adapt to new information, exploiting past experiments. We show theoretically that the method converges asymptotically to the true parameters, while simultaneously maximizing our metric. We show empirically that the process exhibits rapid convergence, yielding good results with a very small number of experiments.
Unimodal Thompson Sampling for Graph-Structured Arms
Paladino, Stefano, Trovò, Francesco, Restelli, Marcello, Gatti, Nicola
We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure. In this setting, each arm corresponds to a node of a graph and each edge provides a relationship, unknown to the learner, between two nodes in terms of expected reward. Furthermore, for any node of the graph there is a path leading to the unique node providing the maximum expected reward, along which the expected reward is monotonically increasing. Previous results on this setting describe the behavior of frequentist MAB algorithms. In our paper, we design a Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting. We show that--as it happens in a wide number of scenarios--Bayesian MAB algorithms dramatically outperform frequentist ones. In particular, we provide a thorough experimental evaluation of the performance of our and state-of-the-art algorithms as the properties of the graph vary.
Statistical comparison of classifiers through Bayesian hierarchical modelling
Corani, Giorgio, Benavoli, Alessio, Demšar, Janez, Mangili, Francesca, Zaffalon, Marco
Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model which jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. It returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different. A further strength of the hierarchical model is that, by jointly analyzing the results obtained on all data sets, it reduces the estimation error compared to the usual approach of averaging the cross-validation results obtained on a given data set.
Probabilistic structure discovery in time series data
Janz, David, Paige, Brooks, Rainforth, Tom, van de Meent, Jan-Willem, Wood, Frank
Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model.
MDL-motivated compression of GLM ensembles increases interpretability and retains predictive power
Hayete, Boris, Valko, Matthew, Greenfield, Alex, Yan, Raymond
Over the years, ensemble methods have become a staple of machine learning. Similarly, generalized linear models (GLMs) have become very popular for a wide variety of statistical inference tasks. The former have been shown to enhance out- of-sample predictive power and the latter possess easy interpretability. Recently, ensembles of GLMs have been proposed as a possibility. On the downside, this approach loses the interpretability that GLMs possess. We show that minimum description length (MDL)-motivated compression of the inferred ensembles can be used to recover interpretability without much, if any, downside to performance and illustrate on a number of standard classification data sets.
Neuron's Eye View: Inferring Features of Complex Stimuli from Neural Responses
Xin, null, Chen, null, Beck, Jeffrey M, Pearson, John M
Experiments that study neural encoding of stimuli at the level of individual neurons typically choose a small set of features present in the world --- contrast and luminance for vision, pitch and intensity for sound --- and assemble a stimulus set that systematically varies along these dimensions. Subsequent analysis of neural responses to these stimuli typically focuses on regression models, with experimenter-controlled features as predictors and spike counts or firing rates as responses. Unfortunately, this approach requires knowledge in advance about the relevant features coded by a given population of neurons. For domains as complex as social interaction or natural movement, however, the relevant feature space is poorly understood, and an arbitrary \emph{a priori} choice of features may give rise to confirmation bias. Here, we present a Bayesian model for exploratory data analysis that is capable of automatically identifying the features present in unstructured stimuli based solely on neuronal responses. Our approach is unique within the class of latent state space models of neural activity in that it assumes that firing rates of neurons are sensitive to multiple discrete time-varying features tied to the \emph{stimulus}, each of which has Markov (or semi-Markov) dynamics. That is, we are modeling neural activity as driven by multiple simultaneous stimulus features rather than intrinsic neural dynamics. We derive a fast variational Bayesian inference algorithm and show that it correctly recovers hidden features in synthetic data, as well as ground-truth stimulus features in a prototypical neural dataset. To demonstrate the utility of the algorithm, we also apply it to cluster neural responses and demonstrate successful recovery of features corresponding to monkeys and faces in the image set.
Probabilistic Duality for Parallel Gibbs Sampling without Graph Coloring
Mescheder, Lars, Nowozin, Sebastian, Geiger, Andreas
We present a new notion of probabilistic duality for random variables involving mixture distributions. Using this notion, we show how to implement a highly-parallelizable Gibbs sampler for weakly coupled discrete pairwise graphical models with strictly positive factors that requires almost no preprocessing and is easy to implement. Moreover, we show how our method can be combined with blocking to improve mixing. Even though our method leads to inferior mixing times compared to a sequential Gibbs sampler, we argue that our method is still very useful for large dynamic networks, where factors are added and removed on a continuous basis, as it is hard to maintain a graph coloring in this setup. Similarly, our method is useful for parallelizing Gibbs sampling in graphical models that do not allow for graph colorings with a small number of colors such as densely connected graphs.
Detecting User Intention Changes Using the Kullback-Leibler Distance
Demeester, Eric (University of Leuven) | Hüntemann, Alexander (University of Leuven)
Many people may benefit from assistive robots that understand their users’ intentions and aid them with the execution of these intentions in a safe and intuitive way through shared control. In the past, our research group has worked on semi-autonomous robotic wheelchairs transporting people with mobility challenges. Experimental results with our user-adaptive Bayesian approach for both intention estimation and shared human-machine decision-making under uncertainty have shown that in situations where the driver changes his or her intention, the assistive behavior by the robot may under certain conditions be counter-intuitive as it continues to take actions that are in line with the previous user intention, and this for too long a period of time. To remedy this, this paper proposes an approach to detect such changes in user plans in order to make the robot’s assistive behavior more reactive and thus more intuitive. The approach adopts a test that checks the consistency of the posterior distribution over user intentions with the given steering signals. A proof-of-concept study of this test’s performance is shown.
Resolution of Referential Ambiguity Using Dempster-Shafer Theoretic Pragmatics
Williams, Tom (Tufts University) | Scheutz, Matthias (Tufts University)
A major challenge for robots interacting with humans in realistic environments is handling robots' uncertainty with respect to the identities and properties of the people, places, and things found in their environments: a problem compounded when humans refer to these entities using underspecified language. In this paper, we present a framework for generating clarification requests in the face of both pragmatic and referential ambiguity, and show how we are able to handle several stages of this framework by integrating a Dempster-Shafer (DS)-theoretic pragmatic reasoning component with a probabilistic reference resolution component.