Goto

Collaborating Authors

 Uncertainty


Simultaneous Task Allocation and Planning Under Uncertainty

arXiv.org Artificial Intelligence

In many service robot applications, such as intra-logistics, surveillance or stock monitoring, it is desirable for a collection of tasks to be allocated to a team of robots. In this paper, we address applications such as these where tasks are independent (there are no inter-task dependencies) and each task only requires a single robot to complete it. Most existing approaches for solving this class of problems divide the problem into separate task allocation (TA) and planning processes. TA determines which robot should complete which tasks, and planning determines how each task, or conjunction of tasks, should be completed. This separation is made to reduce the computational complexity of the problem. It allows each robot to plan separately for its own task set, avoiding the need for a joint planning model which is typically exponential in the number of team members. This separation also allows specialised algorithms to be used for the TA and planning parts, increasing the efficiency with which the task-directed behaviour of the team can be generated. When doing this, TA usually assumes a greatly simplified model of planning in order to be able to efficiently compute allocations. However, this separation also means that the TA process cannot be informed by the plans of the individual robots, which prevents it from exploiting opportunities, or avoiding hindrances, that are only evident once planning has been performed.


Nonparametric Gaussian mixture models for the multi-armed contextual bandit

arXiv.org Machine Learning

The multi-armed bandit is a sequential allocation task where an agent must learn a policy that maximizes long term payoff, where only the reward of the played arm is observed at each iteration. In the stochastic setting, the reward for each action is generated from an unknown distribution, which depends on a given 'context', available at each interaction with the world. Thompson sampling is a generative, interpretable multi-armed bandit algorithm that has been shown both to perform well in practice, and to enjoy optimality properties for certain reward functions. Nevertheless, Thompson sampling requires sampling from parameter posteriors and calculation of expected rewards, which are possible for a very limited choice of distributions. We here extend Thompson sampling to more complex scenarios by adopting a very flexible set of reward distributions: nonparametric Gaussian mixture models. The generative process of Bayesian nonparametric mixtures naturally aligns with the Bayesian modeling of multi-armed bandits. This allows for the implementation of an efficient and flexible Thompson sampling algorithm: the nonparametric model autonomously determines its complexity in an online fashion, as it observes new rewards for the played arms. We show how the proposed method sequentially learns the nonparametric mixture model that best approximates the true underlying reward distribution. Our contribution is valuable for practical scenarios, as it avoids stringent model specifications, and yet attains reduced regret.


On Numerical Estimation of Joint Probability Distribution from Lebesgue Integral Quadratures

arXiv.org Machine Learning

An important application of Lebesgue integral quadrature[1] is developed. Given two random processes, $f(x)$ and $g(x)$, two generalized eigenvalue problems can be formulated and solved. In addition to obtaining two Lebesgue quadratures (for $f$ and $g$) from two eigenproblems, the projections of $f$-- and $g$-- eigenvectors on each other allow to build a joint distribution estimator, the most general form of which is a density--matrix correlation. The examples of the density--matrix correlation can be the value--correlation $V_{f_i;g_j}$, similar to the regular correlation concept, and a new one, the probability--correlation $P_{f_i;g_j}$. The theory is implemented numerically; the software is available under the GPLv3 license.


(Sequential) Importance Sampling Bandits

arXiv.org Machine Learning

The multi-armed bandit (MAB) problem is a sequential allocation task where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed; i.e., sequential optimal decisions are made, while simultaneously learning how the world operates. In the stochastic setting, the reward for each action is generated from an unknown distribution. To decide the next optimal action to take, one must compute sufficient statistics of this unknown reward distribution, e.g. upper-confidence bounds (UCB), or expectations in Thompson sampling. Closed-form expressions for these statistics of interest are analytically intractable except for simple cases. We here propose to leverage Monte Carlo estimation and, in particular, the flexibility of (sequential) importance sampling (IS) to allow for accurate estimation of the statistics of interest within the MAB problem. IS methods estimate posterior densities or expectations in probabilistic models that are analytically intractable. We first show how IS can be combined with state-of-the-art MAB algorithms (Thompson sampling and Bayes-UCB) for classic (Bernoulli and contextual linear-Gaussian) bandit problems. Furthermore, we leverage the power of sequential IS to extend the applicability of these algorithms beyond the classic settings, and tackle additional useful cases. Specifically, we study the dynamic linear-Gaussian bandit, and both the static and dynamic logistic cases too. The flexibility of (sequential) importance sampling is shown to be fundamental for obtaining efficient estimates of the key sufficient statistics in these challenging scenarios.


Efficient acquisition rules for model-based approximate Bayesian computation

arXiv.org Machine Learning

Approximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies.


Deep Stacked Stochastic Configuration Networks for Non-Stationary Data Streams

arXiv.org Machine Learning

The concept of stochastic configuration networks (SCNs) others a solid framework for fast implementation of feedforward neural networks through randomized learning. Unlike conventional randomized approaches, SCNs provide an avenue to select appropriate scope of random parameters to ensure the universal approximation property. In this paper, a deep version of stochastic configuration networks, namely deep stacked stochastic configuration network (DSSCN), is proposed for modeling non-stationary data streams. As an extension of evolving stochastic connfiguration networks (eSCNs), this work contributes a way to grow and shrink the structure of deep stochastic configuration networks autonomously from data streams. The performance of DSSCN is evaluated by six benchmark datasets. Simulation results, compared with prominent data stream algorithms, show that the proposed method is capable of achieving comparable accuracy and evolving compact and parsimonious deep stacked network architecture.


Instance-Dependent PU Learning by Bayesian Optimal Relabeling

arXiv.org Machine Learning

When learning from positive and unlabelled data, it is a strong assumption that the positive observations are randomly sampled from the distribution of $X$ conditional on $Y = 1$, where X stands for the feature and Y the label. Most existing algorithms are optimally designed under the assumption. However, for many real-world applications, the observed positive examples are dependent on the conditional probability $P(Y = 1|X)$ and should be sampled biasedly. In this paper, we assume that a positive example with a higher $P(Y = 1|X)$ is more likely to be labelled and propose a probabilistic-gap based PU learning algorithms. Specifically, by treating the unlabelled data as noisy negative examples, we could automatically label a group positive and negative examples whose labels are identical to the ones assigned by a Bayesian optimal classifier with a consistency guarantee. The relabelled examples have a biased domain, which is remedied by the kernel mean matching technique. The proposed algorithm is model-free and thus do not have any parameters to tune. Experimental results demonstrate that our method works well on both generated and real-world datasets.


Probabilistic Causal Analysis of Social Influence

arXiv.org Machine Learning

Mastering the dynamics of social influence requires separating, in a database of information propagation traces, the genuine causal processes from temporal correlation, homophily and other spurious causes. However, most of the studies to characterize social influence and, in general, most data-science analyses focus on correlations, statistical independence, conditional independence etc.; only recently, there has been a resurgence of interest in "causal data science", e.g., grounded on causality theories. In this paper we adopt a principled causal approach to the analysis of social influence from information-propagation data, rooted in probabilistic causal theory. Our approach develops around two phases. In the first step, in order to avoid the pitfalls of misinterpreting causation when the data spans a mixture of several subtypes ("Simpson's paradox"), we partition the set of propagation traces in groups, in such a way that each group is as less contradictory as possible in terms of the hierarchical structure of information propagation. For this goal we borrow from the literature the notion of "agony" and define the Agony-bounded Partitioning problem, which we prove being hard, and for which we develop two efficient algorithms with approximation guarantees. In the second step, for each group from the first phase, we apply a constrained MLE approach to ultimately learn a minimal causal topology. Experiments on synthetic data show that our method is able to retrieve the genuine causal arcs w.r.t. a known ground-truth generative model. Experiments on real data show that, by focusing only on the extracted causal structures instead of the whole social network, we can improve the effectiveness of predicting influence spread.


Unbiased Implicit Variational Inference

arXiv.org Machine Learning

We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbitrarily flexible deep neural networks. Unlike previous works, UIVI directly optimizes the evidence lower bound (ELBO) rather than an approximation to the ELBO. We demonstrate UIVI on several models, including Bayesian multinomial logistic regression and variational autoencoders, and show that UIVI achieves both tighter ELBO and better predictive performance than existing approaches at a similar computational cost.


Concentration bounds for empirical conditional value-at-risk: The unbounded case

arXiv.org Machine Learning

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbounded random variable, which is either sub-Gaussian or sub-exponential. We derive a novel one-sided concentration bound for a natural sample-based CVaR estimator in this setting. Our bound relies on a concentration result for a quantile-based estimator for Value-at-Risk (VaR), which may be of independent interest.