Bayes Theorem provides a principled way for calculating a conditional probability. It is a deceptively simple calculation, although it can be used to easily calculate the conditional probability of events where intuition often fails. Bayes Theorem also provides a way for thinking about the evaluation and selection of different models for a given dataset in applied machine learning. Maximizing the probability of a model fitting a dataset is more generally referred to as maximum a posteriori, or MAP for short, and provides a probabilistic framework for predictive modeling. In this post, you will discover Bayes Theorem for calculating conditional probabilities.

In this article, I will provide a basic introduction to Bayesian learning and explore topics such as frequentist statistics, the drawbacks of the frequentist method, Bayes's theorem (introduced with an example), and the differences between the frequentist and Bayesian methods using the coin flip experiment as the example. To begin, let's try to answer this question: what is the frequentist method? When we flip a coin, there are two possible outcomes -- heads or tails. Of course, there is a third rare possibility where the coin balances on its edge without falling onto either side, which we assume is not a possible outcome of the coin flip for our discussion. We conduct a series of coin flips and record our observations i.e. the number of the heads (or tails) observed for a certain number of coin flips. In this experiment, we are trying to determine the fairness of the coin, using the number of heads (or tails) that we observe.

Bayesian Statistics continues to remain incomprehensible in the ignited minds of many analysts. Being amazed by the incredible power of machine learning, a lot of us have become unfaithful to statistics. Our focus has narrowed down to exploring machine learning. We fail to understand that machine learning is only one way to solve real world problems. In several situations, it does not help us solve business problems, even though there is data involved in these problems. To say the least, knowledge of statistics will allow you to work on complex analytical problems, irrespective of the size of data. In 1770s, Thomas Bayes introduced'Bayes Theorem'.

Imagine you're sleeping, and you hear strange noises in your front lawn. You're very sleepy, so you hypothesize that the strange noises are being generated by a hungry dinosaur. You think to yourself, 'this is exactly what I would hear if there was a dinosaur outside in my front lawn'. But then as you think more about it, you realize that the likelihood of there actually being a dinosaur in your front lawn is extremely low; whereas the likelihood of hearing strange noises from the front lawn is likely pretty high. So you exhale as you realize that the actual probability of there being a dinosaur in your front lawn, aka your original hypothesis, given the evidence is extremely low.

The article was written by Amber Zhou, a Financial Analyst at I Know First. Deep learning has become a buzzward in recent years. In fact, it has once gained much attention and excitements under the name neural networks early back in 1980's. However due to the lack of sufficient compute power and training examples, it gradually experienced a depression in the following decade. As we are entering the Era of Big Data in light of the explosion of computer power, deep learning has recently seen a revival.