In this article, I will provide a basic introduction to Bayesian learning and explore topics such as frequentist statistics, the drawbacks of the frequentist method, Bayes's theorem (introduced with an example), and the differences between the frequentist and Bayesian methods using the coin flip experiment as the example. To begin, let's try to answer this question: what is the frequentist method? When we flip a coin, there are two possible outcomes -- heads or tails. Of course, there is a third rare possibility where the coin balances on its edge without falling onto either side, which we assume is not a possible outcome of the coin flip for our discussion. We conduct a series of coin flips and record our observations i.e. the number of the heads (or tails) observed for a certain number of coin flips. In this experiment, we are trying to determine the fairness of the coin, using the number of heads (or tails) that we observe.

Bayesian Statistics continues to remain incomprehensible in the ignited minds of many analysts. Being amazed by the incredible power of machine learning, a lot of us have become unfaithful to statistics. Our focus has narrowed down to exploring machine learning. We fail to understand that machine learning is only one way to solve real world problems. In several situations, it does not help us solve business problems, even though there is data involved in these problems. To say the least, knowledge of statistics will allow you to work on complex analytical problems, irrespective of the size of data. In 1770s, Thomas Bayes introduced'Bayes Theorem'.

If you have never taken a statistics inference class, you may think the second statement is just a paraphrase of the first. In fact, the difference is profound. This boils down to two opposite ideologies about probability: frequentist and Bayesian. The Bayesian defines probability as a "belief." A belief can be strong or weak, and is modified continuously as new evidence emerges.

This is part one in a series of topics I consider fundamental to machine learning. Probability theory is a mathematical framework for quantifying our uncertainty about the world. It allows us (and our software) to reason effectively in situations where being certain is impossible. Probability theory is at the foundation of many machine learning algorithms. The goal of this post is to cover the vocabulary and mathematics needed before applying probability theory to machine learning applications.