Uncertainty


A Gentle Introduction to Maximum Likelihood Estimation

@machinelearnbot

The first time I heard someone use the term maximum likelihood estimation, I went to Google and found out what it meant. Then I went to Wikipedia to find out what it really meant. To spare you the wrestling required to understand and incorporate MLE into your data science workflow, ethos, and projects, I've compiled this guide. This is funny (if you follow this strange domain of humor), and mostly right about the differences between the two camps. Not minding that our Sun going into nova is not really a repeatable experiment -- sorry, frequentists!


r/MachineLearning - [R] To Build Truly Intelligent Machines, Teach Them Cause and Effect

@machinelearnbot

Just FYI: "Judea Pearl created the representational and computational foundation for the processing of information under uncertainty. He is credited with the invention of Bayesian networks,..." His wiki page doesn't list his contributions, it links to other people's summaries of his contributions. The guy you say is stuck in an old paradigm is one of the main inventors of the new paradigm.


Bayesian Statistics Coursera

@machinelearnbot

About this course: This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes' rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling."



Bayesian Optimal Pricing, Part 1

#artificialintelligence

Pricing is a common problem faced by businesses, and one that can be addressed effectively by Bayesian statistical methods. We'll step through a simple example and build the background necessary to extend get involved with this approach. Let's start with some hypothetical data. A small company has tried a few different price points (say, one week each) and recorded the demand at each price. We'll abstract away some economic issues in order to focus on the statistical approach.


To Build Truly Intelligent Machines, Teach Them Cause and Effect Quanta Magazine

#artificialintelligence

Artificial intelligence owes a lot of its smarts to Judea Pearl. In the 1980s he led efforts that allowed machines to reason probabilistically. In his latest book, "The Book of Why: The New Science of Cause and Effect," he argues that artificial intelligence has been handicapped by an incomplete understanding of what intelligence really is. Three decades ago, a prime challenge in artificial intelligence research was to program machines to associate a potential cause to a set of observable conditions. Pearl figured out how to do that using a scheme called Bayesian networks.


A "quick" introduction to PyMC3 and Bayesian models

@machinelearnbot

We've all been there, maybe 15 minutes before a meeting, at 4 AM after a party, or simply when we feel too lazy to walk. And even though apps like Uber have made it relatively painless, there are still times when it is necessary or practical to just wait for a taxi. So we wait, impatiently, probably while wondering how much we will have to wait. As the name implies, a generative model is a probability model which is able to generate data that looks a lot like the data we might gather from the phenomenon we're trying to model. In our case, we need a model that generates data that looks like waiting times.


Learning Generalized Hypergeometric Distribution (GHD) DAG models

arXiv.org Machine Learning

We introduce a new class of identifiable DAG models, where each node has a conditional distribution given its parents belongs to a family of generalized hypergeometric distributions (GHD). a family of generalized hypergeometric distributions (GHD) includes a lot of discrete distributions such as Binomial, Beta-binomial, Poisson, Poisson type, displaced Poisson, hyper-Poisson, logarithmic, and many more. We prove that if the data drawn from the new class of DAG models, one can fully identify the graph. We further provide a reliable and tractable algorithm that recovers the directed graph from finitely many data. We show through theoretical results and simulations that our algorithm is statistically consistent even in high-dimensional settings ($n >p$) if the degree of the graph is bounded, and performs well compared to state-of-the-art DAG-learning algorithms.



Bayesian Regularization for Graphical Models with Unequal Shrinkage

arXiv.org Machine Learning

We consider a Bayesian framework for estimating a high-dimensional sparse precision matrix, in which adaptive shrinkage and sparsity are induced by a mixture of Laplace priors. Besides discussing our formulation from the Bayesian standpoint, we investigate the MAP (maximum a posteriori) estimator from a penalized likelihood perspective that gives rise to a new non-convex penalty approximating the $\ell_0$ penalty. Optimal error rates for estimation consistency in terms of various matrix norms along with selection consistency for sparse structure recovery are shown for the unique MAP estimator under mild conditions. For fast and efficient computation, an EM algorithm is proposed to compute the MAP estimator of the precision matrix and (approximate) posterior probabilities on the edges of the underlying sparse structure. Through extensive simulation studies and a real application to a call center data, we have demonstrated the fine performance of our method compared with existing alternatives.