Herbrich, Ralf, Graepel, Thore

Subsequently, SVMs have been modified to handle regression [12] and GPs have been adapted to the problem of classification [8]. Both schemes essentially work in the same function space that is characterised by kernels (SVM) and covariance functions (GP), respectively. While the formal similarity of the two methods is striking the underlying paradigms of inference are very different. The SVM was inspired by results from statistical/PAC learning theory while GPs are usually considered in a Bayesian framework. This ideological clash can be viewed as a continuation in machine learning of the by now classical disagreement between Bayesian and frequentistic statistics.

Herbrich, Ralf, Graepel, Thore

In my last blog, I discussed k-Nearest Neighbor machine learning algorithms with an example that was hopefully easy to understand for beginners. During the summer of 2017 I began a five-part series on types of machine learning. That series included more details about K-means clustering, Singular Value Decomposition, Principal Component Analysis, Apriori and Frequent Pattern-Growth. Today I want to expand on the ideas presented in my Naive Bayes "Data Science in 90 Seconds" You Tube video and continue the discussion in plain language.

The Naive Bayes classifier is a simple classifier that is often used as a baseline for comparison with more complex classifiers. We will use the famous MNIST data set (pre-processed via PCA and normalized [TODO]) for this tutorial, so our class labels are {0, 1, …, 9}. If you're like me, you may have found this notation a little confusing at first. We can read the left side P(C X) as "the probability that the class is C given the data X". We can read the right side P(X C) as "the probability that the data X belongs to the class C". (this is called the "likelihood") And we can compute the probability that the class 0 given the data, probability that the class 1 given the data, etc. just by computing the probability of the data for each class (how well the data fits a model of each class).

Classification is a task of grouping things together on the basis of the similarity they share with each other. It helps organize things and thus makes the study more easy and systematic. In statistics, classification refers to the problem of identifying to which set of categories an observation or data value belongs to. For humans, it can be very easy to do the classification task assuming that he/she has proper domain-specific knowledge and given certain features he/she can achieve it by no means. But, it can be tricky for a machine to classify -- unless it is provided with proper training from the data and algorithm (classifier) that is used for learning.