# bayesian inference

### What Is Probability?

Uncertainty involves making decisions with incomplete information, and this is the way we generally operate in the world. Handling uncertainty is typically described using everyday words like chance, luck, and risk. Probability is a field of mathematics that gives us the language and tools to quantify the uncertainty of events and reason in a principled manner. In this post, you will discover a gentle introduction to probability. Photo by Emma Jane Hogbin Westby, some rights reserved.

### 5 Reasons to Learn Probability for Machine Learning

Probability is a field of mathematics that quantifies uncertainty. It is undeniably a pillar of the field of machine learning, and many recommend it as a prerequisite subject to study prior to getting started. This is misleading advice, as probability makes more sense to a practitioner once they have the context of the applied machine learning process in which to interpret it. In this post, you will discover why machine learning practitioners should study probabilities to improve their skills and capabilities. Before we go through the reasons that you should learn probability, let's start off by taking a small look at the reason why you should not.

### Resources for Getting Started With Probability in Machine Learning

Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data. As such, some understanding of probability and tools and methods used in the field are required by a machine learning practitioner to be effective.

### Bayesian Machine Learning

In the previous post we have learnt about the importance of Latent Variables in Bayesian modelling. Now starting from this post, we will see Bayesian in action. We will walk through different aspects of machine learning and see how Bayesian methods will help us in designing the solutions. And also the additional capabilities and insights we can have by using it. The sections which follows are generally known as Bayesian inference.

### Consequences of Model Misspecification for Maximum Likelihood Estimation with Missing Data

Researchers are often faced with the challenge of developing statistical models with incomplete data. Exacerbating this situation is the possibility that either the researcher's complete-data model or the model of the missing-data mechanism is misspecified. In this article, we create a formal theoretical framework for developing statistical models and detecting model misspecification in the presence of incomplete data where maximum likelihood estimates are obtained by maximizing the observable-data likelihood function when the missing-data mechanism is assumed ignorable. First, we provide sufficient regularity conditions on the researcher's complete-data model to characterize the asymptotic behavior of maximum likelihood estimates in the simultaneous presence of both missing data and model misspecification. These results are then used to derive robust hypothesis testing methods for possibly misspecified models in the presence of Missing at Random (MAR) or Missing Not at Random (MNAR) missing data.

### @Bayes' Theorem For Bae

Bayes' Theorem is something that confuses and frustrates many, but is not as awful as many make it out to be. While the formula for "Bae's Theorem" given in the graphic above is silly, doesn't make mathematical sense, and borders on being NSFW, it does help illustrate what the problem statement is (something that throws many, as intuitively it seems kind of backwards). Given that Netflix is occurring, one would want to know the probability of'chill', NOT the other way around. Granted, the right side of the equation is complete nonsense, but the left-side is actually a good mnemonic device, especially given that part of the reason so many students tune-out while learning mathematics is due to the dry sterility of the presentation. The theorem essentially states that: the probability of event A given event B is equal to the probability of B given event A times the probability of event A divided by the probability of B. Which seems very complex without breaking it down bit by bit.

### How to code Gaussian Mixture Models from scratch in Python

In the realm of unsupervised learning algorithms, Gaussian Mixture Models or GMMs are special citizens. GMMs are based on the assumption that all data points come from a fine mixture of Gaussian distributions with unknown parameters. They are parametric generative models that attempt to learn the true data distribution. Hence, once we learn the Gaussian parameters, we can generate data from the same distribution as the source. We can think of GMMs as the soft generalization of the K-Means clustering algorithm.

### Bayesian Machine Learning in Python: A/B Testing

Link: Bayesian Machine Learning in Python: A/B Testing Udemy In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods. These all help you solve the explore-exploit dilemma. Bestseller Created by Lazy Programmer Inc What you'll learn Use adaptive algorithms to improve A/B testing performance Understand the difference between Bayesian and frequentist statistics Apply Bayesian methods to A/B testing In this course, while we will do traditional A/B testing in order to appreciate its complexity, what we will eventually get to is the Bayesian machine learning way of doing things. First, we'll see if we can improve on traditional A/B testing with adaptive methods.

### $\alpha$ Belief Propagation as Fully Factorized Approximation

Belief propagation (BP) can do exact inference in loop-free graphs, but its performance could be poor in graphs with loops, and the understanding of its solution is limited. This work gives an interpretable belief propagation rule that is actually minimization of a localized $\alpha$-divergence. We term this algorithm as $\alpha$ belief propagation ($\alpha$-BP). The performance of $\alpha$-BP is tested in MAP (maximum a posterior) inference problems, where $\alpha$-BP can outperform (loopy) BP by a significant margin even in fully-connected graphs.

### Minimum Description Length Revisited

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.