A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that lets you choose how many training instances to hang onto and learns exactly what those instances should look like. In this post you will discover the Learning Vector Quantization algorithm. This post was written for developers and assumes no background in statistics or mathematics. The post focuses on how the algorithm works and how to use it for predictive modeling problems.

The Learning Vector Quantization (LVQ) algorithm is a lot like k-Nearest Neighbors. Predictions are made by finding the best match among a library of patterns. The difference is that the library of patterns is learned from training data, rather than using the training patterns themselves. The library of patterns are called codebook vectors and each pattern is called a codebook. The codebook vectors are initialized to randomly selected values from the training dataset.

It seems to me, that above definition of k-folded cross validation algorithm (from Deep Learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016) is inconsistent with the common definition of cross - validation. In above algorithm $e$ vector is the vector of loss function calculated for every particular example in the $D$ dataset, and then mean of vector $e$ is the estimation of generalization error. Whereas in standard definition of cross - validation, we calculate test error for each fold and then calculate average of them.

Text data requires special preparation before you can start using it for predictive modeling. The text must be parsed to remove words, called tokenization. Then the words need to be encoded as integers or floating point values for use as input to a machine learning algorithm, called feature extraction (or vectorization). The scikit-learn library offers easy-to-use tools to perform both tokenization and feature extraction of your text data. In this tutorial, you will discover exactly how you can prepare your text data for predictive modeling in Python with scikit-learn.

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for classification as well as regression challenges. It is said to be one of the most popular high-performance algorithms and is implemented in practice using a kernel. In this algorithm, the dataset explains SVM about classes so that it can classify new data. It works by classifying data through finding the line which separates data into classes. It tries to maximise the distance between the various classes and referred as margin maximisation.