Goto

Collaborating Authors

 Uncertainty


Learning causal Bayes networks using interventional path queries in polynomial time and sample complexity

arXiv.org Machine Learning

Causal discovery from empirical data is a fundamental problem in many scientific domains. Observational data allows for identifiability only up to Markov equivalence class. In this paper we first propose a polynomial time algorithm for learning the exact correctly-oriented structure of the transitive reduction of any causal Bayesian networks with high probability, by using interventional path queries. Each path query takes as input an origin node and a target node, and answers whether there is a directed path from the origin to the target. This is done by intervening the origin node and observing samples from the target node. We theoretically show the logarithmic sample complexity for the size of interventional data per path query, for continuous and discrete networks. We further extend our work to learn the transitive edges using logarithmic sample complexity (albeit in time exponential in the maximum number of parents for discrete networks). This allows us to learn the full network. We also provide an analysis of imperfect interventions.


Bayesian Approaches to Distribution Regression

arXiv.org Machine Learning

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.


New Marketing Insight from Unsupervised Bayesian Belief Networks

@machinelearnbot

"Limited-Service Restaurants" (LSRs) is how the restaurant industry refers collectively to fast food and fast-casual dining establishments. Marketers who specialize in LSRs often employ marketing research to evaluate hypotheses about their brands or to detect segments within their markets. An important additional purpose of market research is to understand the total structure of a market, to find out what guests consider important about the LSR experience. Without understanding the way that LSR guests think, marketers fly blind about what innovations in menu or service will appeal to guests. Fundamental market research helps with brand positioning and allocating marketing resources (Marketing Mix Analysis), and also in generating unexpected directions for additional research.


Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

arXiv.org Machine Learning

In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.


A Generative Deep Recurrent Model for Exchangeable Data

arXiv.org Machine Learning

We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks requiring generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, set completion, and anomaly detection.


Nonparametric Bayesian Sparse Graph Linear Dynamical Systems

arXiv.org Machine Learning

A nonparametric Bayesian sparse graph linear dynamical system (SGLDS) is proposed to model sequentially observed multivariate data. SGLDS uses the Bernoulli-Poisson link together with a gamma process to generate an infinite dimensional sparse random graph to model state transitions. Depending on the sparsity pattern of the corresponding row and column of the graph affinity matrix, a latent state of SGLDS can be categorized as either a non-dynamic state or a dynamic one. A normal-gamma construction is used to shrink the energy captured by the non-dynamic states, while the dynamic states can be further categorized into live, absorbing, or noise-injection states, which capture different types of dynamical components of the underlying time series. The state-of-the-art performance of SGLDS is demonstrated with experiments on both synthetic and real data.


Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning

arXiv.org Machine Learning

Bayesian neural networks with latent variables (BNNs LVs) are scalable and flexible probabilistic models: They account for uncertainty in the estimation of the network weights and, by making use of latent variables, they can capture complex noise patterns in the data. In this work, we show how to separate these two forms of uncertainty for decision-making purposes. This decomposition allows us to successfully identify informative points for active learning of functions with heteroskedastic and bimodal noise. We also demonstrate how this decomposition allows us to define a novel risk-sensitive reinforcement learning criterion to identify policies that balance expected cost, model-bias and noise averseness.


Variational Sequential Monte Carlo

arXiv.org Machine Learning

Many recent advances in large scale probabilistic inference rely on variational methods. The success of variational approaches depends on (i) formulating a flexible parametric family of distributions, and (ii) optimizing the parameters to find the member of this family that most closely approximates the exact posterior. In this paper we present a new approximating family of distributions, the variational sequential Monte Carlo (VSMC) family, and show how to optimize it in variational inference. VSMC melds variational inference (VI) and sequential Monte Carlo (SMC), providing practitioners with flexible, accurate, and powerful Bayesian inference. The VSMC family is a variational family that can approximate the posterior arbitrarily well, while still allowing for efficient optimization of its parameters. We demonstrate its utility on state space models, stochastic volatility models for financial data, and deep Markov models of brain neural circuits.


High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm

arXiv.org Machine Learning

We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density \wrt\ the Lebesgue measure on $\mathbb{R}^d$, known up to a normalization factor $x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d}y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of the obtained bounds is studied to demonstrate the applicability of this method. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented.


VBALD - Variational Bayesian Approximation of Log Determinants

arXiv.org Machine Learning

Evaluating the log determinant of a positive definite matrix is ubiquitous in machine learning. Applications thereof range from Gaussian processes, minimum-volume ellipsoids, metric learning, kernel learning, Bayesian neural networks, Determinental Point Processes, Markov random fields to partition functions of discrete graphical models. In order to avoid the canonical, yet prohibitive, Cholesky $\mathcal{O}(n^{3})$ computational cost, we propose a novel approach, with complexity $\mathcal{O}(n^{2})$, based on a constrained variational Bayes algorithm. We compare our method to Taylor, Chebyshev and Lanczos approaches and show state of the art performance on both synthetic and real-world datasets.