Bayesian Inference
Bayes' Theorem with Lego
Looking at the picture above you may have easily figured out \(P(\text{red} \text{yellow})\) by thinking "This is easy! There are 6 yellow pegs, 4 of them are over red so the probability of being over a red block if I'm on a yellow one is 4/6". If you did follow this line of thinking congratulations, you just independently discovered Bayes' Theorem! Of course mathematical language is extremely concise, and human intuition is able to easily jump steps in its reasoning process; getting from our intuition to Bayes' Theorem will require a bit of work. Let's begin formalizing this intuition by coming up with a way to calculate "there are 6 yellow pegs."
Sequential Dirichlet Process Mixtures of Multivariate Skew t-distributions for Model-based Clustering of Flow Cytometry Data
Hejblum, Boris P., Alkhassim, Chariff, Gottardo, Raphael, Caron, Franรงois, Thiรฉbaut, Rodolphe
Flow cytometry is a high-throughput technology used to quantify multiple surface and intracellular markers at the level of a single cell. This enables to identify cell sub-types, and to determine their relative proportions. Improvements of this technology allow to describe millions of individual cells from a blood sample using multiple markers. This results in high-dimensional datasets, whose manual analysis is highly time-consuming and poorly reproducible. While several methods have been developed to perform automatic recognition of cell populations, most of them treat and analyze each sample independently. However, in practice, individual samples are rarely independent (e.g. longitudinal studies). Here, we propose to use a Bayesian nonparametric approach with Dirichlet process mixture (DPM) of multivariate skew $t$-distributions to perform model based clustering of flow-cytometry data. DPM models directly estimate the number of cell populations from the data, avoiding model selection issues, and skew $t$-distributions provides robustness to outliers and non-elliptical shape of cell populations. To accommodate repeated measurements, we propose a sequential strategy relying on a parametric approximation of the posterior. We illustrate the good performance of our method on simulated data, on an experimental benchmark dataset, and on new longitudinal data from the DALIA-1 trial which evaluates a therapeutic vaccine against HIV. On the benchmark dataset, the sequential strategy outperforms all other methods evaluated, and similarly, leads to improved performance on the DALIA-1 data. We have made the method available for the community in the R package NPflow.
Admissibility of a posterior predictive decision rule
As reviewed by [Owhadi and Scovel], the field of statistical decision theory introduced by Wald, building on a game theoretic foundation developed by von Neumann and Morgenstern, provides links between Bayesian and frequentist statistical philosophies through the concepts of decision rules, admissibility, and risk functions amongst others. Moreover, a recent thrust of research motivated by machine learning has put much emphasis on prediction problems for which Bayesian methodology has been widely used. The purpose of this note is to demonstrate that classic decision theoretic results can be simply applied to the analysis of prediction problems. In fact, both [Berger] and [Robert] remark upon the ease of applying statistical decision theory within the context of prediction, however, no explicit result is stated in either work; the contribution of this note, therefore, is to highlight a simple way in which the results of statistical decision theory might apply to prediction problems. To the author's knowledge the most similar lines of thought appear in work by [Nayak and El-Baz], where the loss function depends on the underlying parameter (in contrast to what follows).
Uncertainty measurement with belief entropy on interference effect in Quantum-Like Bayesian Networks
Huang, Zhiming, Yang, Lin, Jiang, Wen
Social dilemmas have been regarded as the essence of evolution game theory, in which the prisoner's dilemma game is the most famous metaphor for the problem of cooperation. Recent findings revealed people's behavior violated the Sure Thing Principle in such games. Classic probability methodologies have difficulty explaining the underlying mechanisms of people's behavior. In this paper, a novel quantum-like Bayesian Network was proposed to accommodate the paradoxical phenomenon. The special network can take interference into consideration, which is likely to be an efficient way to describe the underlying mechanism. With the assistance of belief entropy, named as Deng entropy, the paper proposes Belief Distance to render the model practical. Tested with empirical data, the proposed model is proved to be predictable and effective.
Roll-back Hamiltonian Monte Carlo
Yi, Kexin, Doshi-Velez, Finale
We propose a new framework for Hamiltonian Monte Carlo (HMC) on truncated probability distributions with smooth underlying density functions. Traditional HMC requires computing the gradient of potential function associated with the target distribution, and therefore does not perform its full power on truncated distributions due to lack of continuity and differentiability. In our framework, we introduce a sharp sigmoid factor in the density function to approximate the probability drop at the truncation boundary. The target potential function is approximated by a new potential which smoothly extends to the entire sample space. HMC is then performed on the approximate potential. While our method is easy to implement and applies to a wide range of problems, it also achieves comparable computational efficiency on various sampling tasks compared to other baseline methods. RBHMC also gives rise to a new approach for Bayesian inference on constrained spaces.
A Brief Introduction to Machine Learning for Engineers
Department of Informatics, King's College London; osvaldo.simeone@kcl.ac.uk ABSTRACT This monograph aims at providing an introduction to key concepts, algorithms, and theoretical frameworks in machine learning, including supervised and unsupervised learning, statistical learning theory, probabilistic graphical models and approximate inference. The intended readership consists of electrical engineers with a background in probability and linear algebra. The treatment builds on first principles, and organizes the main ideas according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, directed and undirected models, and convex and non-convex optimization. The mathematical framework uses information-theoretic measures as a unifying tool. The text offers simple and reproducible numerical examples providing insights into key motivations and conclusions. Rather than providing exhaustive details on the existing myriad solutions in each specific category, for which the reader is referred to textbooks and papers, this monograph is meant as an entry point for an engineer into the literature on machine learning.
Distributed Bayesian Learning with Stochastic Natural-gradient Expectation Propagation and the Posterior Server
Hasenclever, Leonard, Webb, Stefan, Lienart, Thibaut, Vollmer, Sebastian, Lakshminarayanan, Balaji, Blundell, Charles, Teh, Yee Whye
This paper makes two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in that it does not require any simplifying assumptions on the distribution of interest, beyond the existence of some Monte Carlo sampler for estimating the moments of the EP tilted distributions. Further, as opposed to EP which has no guarantee of convergence, SNEP can be shown to be convergent, even when using Monte Carlo moment estimates. Secondly, we propose a novel architecture for distributed Bayesian learning which we call the posterior server. The posterior server allows scalable and robust Bayesian learning in cases where a data set is stored in a distributed manner across a cluster, with each compute node containing a disjoint subset of data. An independent Monte Carlo sampler is run on each compute node, with direct access only to the local data subset, but which targets an approximation to the global posterior distribution given all data across the whole cluster. This is achieved by using a distributed asynchronous implementation of SNEP to pass messages across the cluster. We demonstrate SNEP and the posterior server on distributed Bayesian learning of logistic regression and neural networks. Keywords: Distributed Learning, Large Scale Learning, Deep Learning, Bayesian Learn- ing, Variational Inference, Expectation Propagation, Stochastic Approximation, Natural Gradient, Markov chain Monte Carlo, Parameter Server, Posterior Server.
Quantification of observed prior and likelihood information in parametric Bayesian modeling
Two data-dependent information metrics are developed to quantify the information of the prior and likelihood functions within a parametric Bayesian model, one of which is closely related to the reference priors from Berger, Bernardo, and Sun, and information measure introduced by Lindley. A combination of theoretical, empirical, and computational support provides evidence that these information-theoretic metrics may be useful diagnostic tools when performing a Bayesian analysis.
"I can assure you [$\ldots$] that it's going to be all right" -- A definition, case for, and survey of algorithmic assurances in human-autonomy trust relationships
In essence, people who interact with advanced technology want to be able to trust it appropriately, and then act on that trust. In interpersonal relationships, and otherwise, humans act largely based on trust. For example, a supervisor asks a subordinate to accomplish a task based on several factors that indicate they can trust them to accomplish that task. When consumers make purchases, they do so with trust that the product will perform as promised. Likewise, when using something like an autonomous vehicle, the user must be able to trust it appropriately in order to use it properly. With the rapid advancement of the capabilities of intelligent computing technology to do tasks that were previously assumed to be too complicated for computers, there has been much recent discussion regarding how humans can trust this technology - although the connection to trust is not always made explicit, per se.
A State-Space Approach to Dynamic Nonnegative Matrix Factorization
Mohammadiha, Nasser, Smaragdis, Paris, Panahandeh, Ghazaleh, Doclo, Simon
Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade. A significant amount of attention has been given to develop NMF algorithms that are suitable to model time series with strong temporal dependencies. In this paper, we propose a novel state-space approach to perform dynamic NMF (D-NMF). In the proposed probabilistic framework, the NMF coefficients act as the state variables and their dynamics are modeled using a multi-lag nonnegative vector autoregressive (N-VAR) model within the process equation. We use expectation maximization and propose a maximum-likelihood estimation framework to estimate the basis matrix and the N-VAR model parameters. Interestingly, the N-VAR model parameters are obtained by simply applying NMF. Moreover, we derive a maximum a posteriori estimate of the state variables (i.e., the NMF coefficients) that is based on a prediction step and an update step, similarly to the Kalman filter. We illustrate the benefits of the proposed approach using different numerical simulations where D-NMF significantly outperforms its static counterpart. Experimental results for three different applications show that the proposed approach outperforms two state-of-the-art NMF approaches that exploit temporal dependencies, namely a nonnegative hidden Markov model and a frame stacking approach, while it requires less memory and computational power.