The above equation represents Bayes Theorem in which it describes the probability of an event occurring P(A) based on our prior knowledge of events that may be related to that event P(B). Example of using Bayes theorem: I'll be using the tennis weather dataset. What is the probability of playing tennis given it is rainy? The probability of playing tennis when it is rainy is 60%. The process is very simple once you obtain the frequencies for each category.

Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction. In this first part of a series, we will take a look at the theory of naive Bayes classifiers and introduce the basic concepts of text classification. In following articles, we will implement those concepts to train a naive Bayes spam filter and apply naive Bayes to song classification based on lyrics. Starting more than half a century ago, scientists became very serious about addressing the question: "Can we build a model that learns from available data and automatically makes the right decisions and predictions?" Looking back, this sounds almost like a rhetoric question, and the answer can be found in numerous applications that are emerging from the fields of pattern classification, machine learning, and artificial intelligence. Data from various sensoring devices combined with powerful learning algorithms and domain knowledge led to many great inventions that we now take for granted in our everyday life: Internet queries via search engines like Google, text recognition at the post office, barcode scanners at the supermarket, the diagnosis of diseases, speech recognition by Siri or Google Now on our mobile phone, just to name a few.

Naive Bayes is a simple but surprisingly powerful algorithm for predictive modeling. In this post you will discover the Naive Bayes algorithm for classification. This post is written for developers and does not assume any background in statistics or probability, although knowing a little probability wouldn't hurt. Naive Bayes for Machine Learning Photo by John Morgan, some rights reserved. In machine learning we are often interested in selecting the best hypothesis (h) given data (d).

When I was six years old, I remember walking with my father to the doctor's office, which was in a clinic two towns from where we lived. When we reached the Afari clinic, the only nurse on duty recorded my vital symptoms, including my temperature, pulse, and blood pressure, and told us to wait for our turn. I was the 30th person in line to meet the only doctor available at the clinic. We waited for hours before it was finally my turn. The doctor went over my vital symptoms which were: Pressure: Normal; Temperature: High; Pulse: Normal.

Because the Bayes classifier is optimal, the Bayes error is the minimum possible error that can be made. Further, the model is often described in terms of classification, e.g. the Bayes Classifier. Nevertheless, the principle applies just as well to regression: that is, predictive modeling problems where a numerical value is predicted instead of a class label. It is a theoretical model, but it is held up as an ideal that we may wish to pursue. In theory we would always like to predict qualitative responses using the Bayes classifier. But for real data, we do not know the conditional distribution of Y given X, and so computing the Bayes classifier is impossible. Therefore, the Bayes classifier serves as an unattainable gold standard against which to compare other methods.