Naive Bayes Classifiers are probabilistic models that are used for the classification task. It is based on the Bayes theorem with an assumption of independence among predictors. In the real-world, the independence assumption may or may not be true, but still, Naive Bayes performs well. Naive It is called naive because it assumes that all features in the dataset are mutually independent. Bayes, It is based on Bayes Theorem.

In this tutorial we will discuss about Naive Bayes text classifier. Naive Bayes is one of the simplest classifiers that one can use because of the simple mathematics that are involved and due to the fact that it is easy to code with every standard programming language including PHP, C#, JAVA etc. Update: The Datumbox Machine Learning Framework is now open-source and free to download. Note that some of the techniques described below are used on Datumbox's Text Analysis service and they power up our API. The Naive Bayes classifier is a simple probabilistic classifier which is based on Bayes theorem with strong and naïve independence assumptions. It is one of the most basic text classification techniques with various applications in email spam detection, personal email sorting, document categorization, sexually explicit content detection, language detection and sentiment detection.

Let us talk about Bayesian Network. Bayesian Network is a probablistic model represent a set of random variables and their conditional dependencies. This model can be represented using DAG (Directed Acrylic Graph) where nodes can be observable quantities, latent variables (not observable, inferred only) and not known parameters or hypothesis. DAG can help to understand the model in a easy manner. Edges in DAG represents conditional dependencies between nodes.

The Naive Bayes classifier is a simple classifier that is often used as a baseline for comparison with more complex classifiers. We will use the famous MNIST data set (pre-processed via PCA and normalized [TODO]) for this tutorial, so our class labels are {0, 1, …, 9}. If you're like me, you may have found this notation a little confusing at first. We can read the left side P(C X) as "the probability that the class is C given the data X". We can read the right side P(X C) as "the probability that the data X belongs to the class C". (this is called the "likelihood") And we can compute the probability that the class 0 given the data, probability that the class 1 given the data, etc. just by computing the probability of the data for each class (how well the data fits a model of each class).