Directed Networks
Bayesian Approaches to Distribution Regression
Law, Ho Chung Leon, Sutherland, Dougal J., Sejdinovic, Dino, Flaxman, Seth
Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.
New Marketing Insight from Unsupervised Bayesian Belief Networks
"Limited-Service Restaurants" (LSRs) is how the restaurant industry refers collectively to fast food and fast-casual dining establishments. Marketers who specialize in LSRs often employ marketing research to evaluate hypotheses about their brands or to detect segments within their markets. An important additional purpose of market research is to understand the total structure of a market, to find out what guests consider important about the LSR experience. Without understanding the way that LSR guests think, marketers fly blind about what innovations in menu or service will appeal to guests. Fundamental market research helps with brand positioning and allocating marketing resources (Marketing Mix Analysis), and also in generating unexpected directions for additional research.
A Generative Deep Recurrent Model for Exchangeable Data
Korshunova, Iryna, Degrave, Jonas, Huszรกr, Ferenc, Gal, Yarin, Gretton, Arthur, Dambre, Joni
We present a novel model architecture which leverages deep learning tools to perform exact Bayesian inference on sets of high dimensional, complex observations. Our model is provably exchangeable, meaning that the joint distribution over observations is invariant under permutation: this property lies at the heart of Bayesian inference. The model does not require variational approximations to train, and new samples can be generated conditional on previous samples, with cost linear in the size of the conditioning set. The advantages of our architecture are demonstrated on learning tasks requiring generalisation from short observed sequences while modelling sequence variability, such as conditional image generation, few-shot learning, set completion, and anomaly detection.
Adversarial classification: An adversarial risk analysis approach
Naveiro, Roi, Redondo, Alberto, Insua, David Rรญos, Ruggeri, Fabrizio
Classification is one of the most widely used instances of supervised learning, with applications in numerous fields including spam detection, Fan et al. (2016); computer vision, Chen (2015); and genomics, Zhou et al. (2005). In recent years, the field has experienced an enormous growth becoming a major research area in statistics and machine learning, Efron and Hastie (2016). Most efforts in classification have focused on obtaining more accurate algorithms which, however, largely ignore a relevant issue in many applications: the presence of adversaries who actively manipulate the data to fool the classifier so as to attain a benefit. As an example, when a spammer makes the classifier think that a spam is legit, he may profit by selling the information he gets from the victim. In such contexts, as classification algorithms improve, adversaries usually become smarter when making attacks.
Nonparametric Bayesian Sparse Graph Linear Dynamical Systems
Kalantari, Rahi, Ghosh, Joydeep, Zhou, Mingyuan
A nonparametric Bayesian sparse graph linear dynamical system (SGLDS) is proposed to model sequentially observed multivariate data. SGLDS uses the Bernoulli-Poisson link together with a gamma process to generate an infinite dimensional sparse random graph to model state transitions. Depending on the sparsity pattern of the corresponding row and column of the graph affinity matrix, a latent state of SGLDS can be categorized as either a non-dynamic state or a dynamic one. A normal-gamma construction is used to shrink the energy captured by the non-dynamic states, while the dynamic states can be further categorized into live, absorbing, or noise-injection states, which capture different types of dynamical components of the underlying time series. The state-of-the-art performance of SGLDS is demonstrated with experiments on both synthetic and real data.
Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning
Depeweg, Stefan, Hernรกndez-Lobato, Josรฉ Miguel, Doshi-Velez, Finale, Udluft, Steffen
Bayesian neural networks with latent variables (BNNs LVs) are scalable and flexible probabilistic models: They account for uncertainty in the estimation of the network weights and, by making use of latent variables, they can capture complex noise patterns in the data. In this work, we show how to separate these two forms of uncertainty for decision-making purposes. This decomposition allows us to successfully identify informative points for active learning of functions with heteroskedastic and bimodal noise. We also demonstrate how this decomposition allows us to define a novel risk-sensitive reinforcement learning criterion to identify policies that balance expected cost, model-bias and noise averseness.
Active Learning for Convolutional Neural Networks: A Core-Set Approach
Convolutional neural networks (CNNs) have been successfully applied to many recognition and learning tasks using a universal recipe; training a deep model on a very large dataset of supervised examples. However, this approach is rather restrictive in practice since collecting a large set of labeled images is very expensive. One way to ease this problem is coming up with smart ways for choosing images to be labelled from a very large collection (ie. active learning). Our empirical study suggests that many of the active learning heuristics in the literature are not effective when applied to CNNs in batch setting. Inspired by these limitations, we define the problem of active learning as core-set selection, ie. choosing set of points such that a model learned over the selected subset is competitive for the remaining data points. We further present a theoretical result characterizing the performance of any selected subset using the geometry of the datapoints. As an active learning algorithm, we choose the subset which is expected to yield best result according to our characterization. Our experiments show that the proposed method significantly outperforms existing approaches in image classification experiments by a large margin.
Variational Sequential Monte Carlo
Naesseth, Christian A., Linderman, Scott W., Ranganath, Rajesh, Blei, David M.
Many recent advances in large scale probabilistic inference rely on variational methods. The success of variational approaches depends on (i) formulating a flexible parametric family of distributions, and (ii) optimizing the parameters to find the member of this family that most closely approximates the exact posterior. In this paper we present a new approximating family of distributions, the variational sequential Monte Carlo (VSMC) family, and show how to optimize it in variational inference. VSMC melds variational inference (VI) and sequential Monte Carlo (SMC), providing practitioners with flexible, accurate, and powerful Bayesian inference. The VSMC family is a variational family that can approximate the posterior arbitrarily well, while still allowing for efficient optimization of its parameters. We demonstrate its utility on state space models, stochastic volatility models for financial data, and deep Markov models of brain neural circuits.
High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm
We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density \wrt\ the Lebesgue measure on $\mathbb{R}^d$, known up to a normalization factor $x \mapsto \pi(x)= \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d}y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of the obtained bounds is studied to demonstrate the applicability of this method. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented.
VBALD - Variational Bayesian Approximation of Log Determinants
Granziol, Diego, Wagstaff, Edward, Ru, Bin Xin, Osborne, Michael, Roberts, Stephen
Evaluating the log determinant of a positive definite matrix is ubiquitous in machine learning. Applications thereof range from Gaussian processes, minimum-volume ellipsoids, metric learning, kernel learning, Bayesian neural networks, Determinental Point Processes, Markov random fields to partition functions of discrete graphical models. In order to avoid the canonical, yet prohibitive, Cholesky $\mathcal{O}(n^{3})$ computational cost, we propose a novel approach, with complexity $\mathcal{O}(n^{2})$, based on a constrained variational Bayes algorithm. We compare our method to Taylor, Chebyshev and Lanczos approaches and show state of the art performance on both synthetic and real-world datasets.