If you've been learning about data science or machine learning, there's a good chance you've heard the term "Bayes Theorem" before, or a "Bayes classifier". These concepts can be somewhat confusing, especially if you aren't used to thinking of probability from a traditional, frequentist statistics perspective. This article will attempt to explain the principles behind Bayes Theorem and how it's used in machine learning. Bayes Theorem is a method of calculating conditional probability. The traditional method of calculating conditional probability (the probability that one event occurs given the occurrence of a different event) is to use the conditional probability formula, calculating the joint probability of event one and event two occurring at the same time, and then dividing it by the probability of event two occurring.
Last summer, I was at a conference having lunch with Hal Daume III when we got to talking about how "Bayesian" can be a funny and ambiguous term. It seems like the definition should be straightforward: "following the work of English mathematician Rev. Thomas Bayes," perhaps, or even "uses Bayes' theorem." But many methods bearing the reverend's name or using his theorem aren't even considered "Bayesian" by his most religious followers. Why is it that Bayesian networks, for example, aren't considered… y'know… Bayesian? As I've read more outside the fields of machine learning and natural language processing -- from psychometrics and environmental biology to hackers who dabble in data science -- I've noticed three broad uses of the term "Bayesian."
In this tutorial we will discuss about Naive Bayes text classifier. Naive Bayes is one of the simplest classifiers that one can use because of the simple mathematics that are involved and due to the fact that it is easy to code with every standard programming language including PHP, C#, JAVA etc. Update: The Datumbox Machine Learning Framework is now open-source and free to download. Note that some of the techniques described below are used on Datumbox's Text Analysis service and they power up our API. The Naive Bayes classifier is a simple probabilistic classifier which is based on Bayes theorem with strong and naïve independence assumptions. It is one of the most basic text classification techniques with various applications in email spam detection, personal email sorting, document categorization, sexually explicit content detection, language detection and sentiment detection.
In my last blog, I discussed k-Nearest Neighbor machine learning algorithms with an example that was hopefully easy to understand for beginners. During the summer of 2017 I began a five-part series on types of machine learning. That series included more details about K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori and Frequent Pattern-Growth. Today I want to expand on the ideas presented in my Naive Bayes "Data Science in 90 Seconds" You Tube video and continue the discussion in plain language.
Here's a situation you've got into: You are working on a classification problem and you have generated your set of hypothesis, created features and discussed the importance of variables. Within an hour, stakeholders want to see the first cut of the model. You have hunderds of thousands of data points and quite a few variables in your training data set. In such situation, if I were at your place, I would have used'Naive Bayes', which can be extremely fast relative to other classification algorithms. It works on Bayes theorem of probability to predict the class of unknown data set.