Bayesian Inference
Differentially Private Bayesian Learning on Distributed Data
Heikkilรค, Mikko, Lagerspetz, Eemil, Kaski, Samuel, Shimizu, Kana, Tarkoma, Sasu, Honkela, Antti
Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results. The standard DP algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a distributed setting, where each party only holds a single sample or a few samples of the data. We propose a learning strategy based on a secure multi-party sum function for aggregating summaries from data holders and the Gaussian mechanism for DP. Our method builds on an asymptotically optimal and practically efficient DP Bayesian inference with rapidly diminishing extra cost.
How Bayesian Inference Works
Since there are 25 long haired women and 2 long haired men, guessing that the ticket owner is a woman is a safe bet. To lay our foundation, we need to quickly mention four concepts: probabilities, conditional probabilities, joint probabilities and marginal probabilities. The probability of a thing happening is the number of ways that thing can happen divided by the total number of things that can happen. Combining these by multiplication gives the joint probability, P(woman with short hair) P(woman) * P(short hair woman).
Density Estimation in Infinite Dimensional Exponential Families
Sriperumbudur, Bharath, Fukumizu, Kenji, Gretton, Arthur, Hyvรคrinen, Aapo, Kumar, Revant
In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class of densities on $\mathbb{R}^d$ can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in $\mathcal{P}$. The main goal of the paper is to estimate an unknown density, $p_0$ through an element in $\mathcal{P}$. Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between $p_0$ and $\mathcal{P}$, do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. Instead, we propose an estimator, $\hat{p}_n$ based on minimizing the \emph{Fisher divergence}, $J(p_0\Vert p)$ between $p_0$ and $p\in \mathcal{P}$, which involves solving a simple finite-dimensional linear system. When $p_0\in\mathcal{P}$, we show that the proposed estimator is consistent, and provide a convergence rate of $n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that $\log p_0\in\mathcal{R}(C^\beta)$ for some $\beta\ge 0$, where $C$ is a certain Hilbert-Schmidt operator on $H$ and $\mathcal{R}(C^\beta)$ denotes the image of $C^\beta$. We also investigate the misspecified case of $p_0\notin\mathcal{P}$ and show that $J(p_0\Vert\hat{p}_n)\rightarrow \inf_{p\in\mathcal{P}}J(p_0\Vert p)$ as $n\rightarrow\infty$, and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non-parametric kernel density estimator, and that the advantage with the proposed estimator grows as $d$ increases.
Everything that Works Works Because it's Bayesian: Why Deep Nets Generalize?
The Bayesian community should really start going to ICLR. They really should have started going years ago. For too long we Bayesians have, quite arrogantly, dismissed deep neural networks as unprincipled, dumb black boxes that lack elegance. We said that highly over-parametrised models fitted via maximum likelihood can't possibly work, they will overfit, won't generalise, etc. We touted our Bayesian nonparametric models instead: Chinese restaurants, Indian buffets, Gaussian processes. And, when things started looking really dire for us Bayesians, we even formed an alliance with kernel people, who used to be our mortal enemies just years before because they like convex optimisation.
Empirically Grounded Agent-Based Models of Innovation Diffusion: A Critical Review
Zhang, Haifeng, Vorobeychik, Yevgeniy
Innovation diffusion has been studied extensively in a variety of disciplines, including sociology, economics, marketing, ecology, and computer science. Traditional literature on innovation diffusion has been dominated by models of aggregate behavior and trends. However, the agent-based modeling (ABM) paradigm is gaining popularity as it captures agent heterogeneity and enables fine-grained modeling of interactions mediated by social and geographic networks. While most ABM work on innovation diffusion is theoretical, empirically grounded models are increasingly important, particularly in guiding policy decisions. We present a critical review of empirically grounded agent-based models of innovation diffusion, developing a categorization of this research based on types of agent models as well as applications. By connecting the modeling methodologies in the fields of information and innovation diffusion, we suggest that the maximum likelihood estimation framework widely used in the former is a promising paradigm for calibration of agent-based models for innovation diffusion. Although many advances have been made to standardize ABM methodology, we identify four major issues in model calibration and validation, and suggest potential solutions.
Iterative Bayesian Learning for Crowdsourced Regression
Ok, Jungseul, Oh, Sewoong, Jang, Yunhun, Shin, Jinwoo, Yi, Yung
Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volumes of tasks. As many low-paid workers are prone to give noisy answers, one of the fundamental questions is how to identify more reliable workers and exploit this heterogeneity to infer the true answers accurately. Despite significant research efforts for classification tasks with discrete answers, little attention has been paid to regression tasks with continuous answers. The popular Dawid-Skene model for discrete answers has the algorithmic and mathematical simplicity in relation to low-rank structures. But it does not generalize for continuous valued answers. To this end, we introduce a new probabilistic model for crowdsourced regression capturing the heterogeneity of the workers, generalizing the Dawid-Skene model to the continuous domain. We design a message-passing algorithm for Bayesian inference inspired by the popular belief propagation algorithm. We showcase its performance first by proving that it achieves a near optimal mean squared error by comparing it to an oracle estimator. Asymptotically, we can provide a tighter analysis showing that the proposed algorithm achieves the exact optimal performance. We next show synthetic experiments confirming our theoretical predictions. As a practical application, we further emulate a crowdsourcing system reproducing PASCAL visual object classes datasets and show that de-noising the crowdsourced data from the proposed scheme can significantly improve the performance for the vision task.
Christopher Fonnesbeck - Introduction to Statistical Modeling with Python - PyCon 2017
"Speaker: Christopher Fonnesbeck This intermediate-level tutorial will provide students with hands-on experience applying practical statistical modeling methods on real data. Unlike many introductory statistics courses, we will not be applying ""cookbook"" methods that are easy to teach, but often inapplicable; instead, we will learn some foundational statistical methods that can be applied generally to a wide variety of problems: maximum likelihood, bootstrapping, linear regression, and other modern techniques. The tutorial will start with a short introduction on data manipulation and cleaning using [pandas](http://pandas.pydata.org/), Slightly more advanced topics include bootstrapping (for estimating uncertainty around estimates) and flexible linear regression methods using Bayesian methods. By using and modifying hand-coded implementations of these techniques, students will gain an understanding of how each method works.
Approximate Inference with Amortised MCMC
Li, Yingzhen, Turner, Richard E., Liu, Qiang
We propose a novel approximate inference framework that approximates a target distribution by amortising the dynamics of a user-selected Markov chain Monte Carlo (MCMC) sampler. The idea is to initialise MCMC using samples from an approximation network, apply the MCMC operator to improve these samples, and finally use the samples to update the approximation network thereby improving its quality. This provides a new generic framework for approximate inference, allowing us to deploy highly complex, or implicitly defined approximation families with intractable densities, including approximations produced by warping a source of randomness through a deep neural network. Experiments consider Bayesian neural network classification and image modelling with deep generative models. Deep models trained using amortised MCMC are shown to generate realistic looking samples as well as producing diverse imputations for images with regions of missing pixels.
10 Free Must-Read Books for Machine Learning and Data Science
This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
Bayesian Machine Learning, Explained
So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors. Feel free to point them out, either in the comments or privately.