Bayesian Inference


A Gentle Introduction to Maximum Likelihood Estimation

@machinelearnbot

The first time I heard someone use the term maximum likelihood estimation, I went to Google and found out what it meant. Then I went to Wikipedia to find out what it really meant. To spare you the wrestling required to understand and incorporate MLE into your data science workflow, ethos, and projects, I've compiled this guide. This is funny (if you follow this strange domain of humor), and mostly right about the differences between the two camps. Not minding that our Sun going into nova is not really a repeatable experiment -- sorry, frequentists!


Bayesian Statistics Coursera

@machinelearnbot

About this course: This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes' rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling."



Bayesian Optimal Pricing, Part 1

#artificialintelligence

Pricing is a common problem faced by businesses, and one that can be addressed effectively by Bayesian statistical methods. We'll step through a simple example and build the background necessary to extend get involved with this approach. Let's start with some hypothetical data. A small company has tried a few different price points (say, one week each) and recorded the demand at each price. We'll abstract away some economic issues in order to focus on the statistical approach.


To Build Truly Intelligent Machines, Teach Them Cause and Effect Quanta Magazine

#artificialintelligence

Artificial intelligence owes a lot of its smarts to Judea Pearl. In the 1980s he led efforts that allowed machines to reason probabilistically. In his latest book, "The Book of Why: The New Science of Cause and Effect," he argues that artificial intelligence has been handicapped by an incomplete understanding of what intelligence really is. Three decades ago, a prime challenge in artificial intelligence research was to program machines to associate a potential cause to a set of observable conditions. Pearl figured out how to do that using a scheme called Bayesian networks.


A "quick" introduction to PyMC3 and Bayesian models

@machinelearnbot

We've all been there, maybe 15 minutes before a meeting, at 4 AM after a party, or simply when we feel too lazy to walk. And even though apps like Uber have made it relatively painless, there are still times when it is necessary or practical to just wait for a taxi. So we wait, impatiently, probably while wondering how much we will have to wait. As the name implies, a generative model is a probability model which is able to generate data that looks a lot like the data we might gather from the phenomenon we're trying to model. In our case, we need a model that generates data that looks like waiting times.



Bayesian Regularization for Graphical Models with Unequal Shrinkage

arXiv.org Machine Learning

We consider a Bayesian framework for estimating a high-dimensional sparse precision matrix, in which adaptive shrinkage and sparsity are induced by a mixture of Laplace priors. Besides discussing our formulation from the Bayesian standpoint, we investigate the MAP (maximum a posteriori) estimator from a penalized likelihood perspective that gives rise to a new non-convex penalty approximating the $\ell_0$ penalty. Optimal error rates for estimation consistency in terms of various matrix norms along with selection consistency for sparse structure recovery are shown for the unique MAP estimator under mild conditions. For fast and efficient computation, an EM algorithm is proposed to compute the MAP estimator of the precision matrix and (approximate) posterior probabilities on the edges of the underlying sparse structure. Through extensive simulation studies and a real application to a call center data, we have demonstrated the fine performance of our method compared with existing alternatives.


Exploration by Distributional Reinforcement Learning

arXiv.org Machine Learning

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. We show that our proposed framework conceptually unifies multiple previous methods in exploration. We also derive a practical algorithm that achieves efficient exploration on challenging control tasks.


BelMan: Bayesian Bandits on the Belief--Reward Manifold

arXiv.org Machine Learning

We propose a generic, Bayesian, information geometric approach to the exploration--exploitation trade-off in multi-armed bandit problems. Our approach, BelMan, uniformly supports pure exploration, exploration--exploitation, and two-phase bandit problems. The knowledge on bandit arms and their reward distributions is summarised by the barycentre of the joint distributions of beliefs and rewards of the arms, the \emph{pseudobelief-reward}, within the beliefs-rewards manifold. BelMan alternates \emph{information projection} and \emph{reverse information projection}, i.e., projection of the pseudobelief-reward onto beliefs-rewards to choose the arm to play, and projection of the resulting beliefs-rewards onto the pseudobelief-reward. It introduces a mechanism that infuses an exploitative bias by means of a \emph{focal distribution}, i.e., a reward distribution that gradually concentrates on higher rewards. Comparative performance evaluation with state-of-the-art algorithms shows that BelMan is not only competitive but can also outperform other approaches in specific setups, for instance involving many arms and continuous rewards.