As you might know, supervised machine learning is one of the most commonly used and successful types of machine learning. In this article, we will describe supervised learning in more detail and explain several popular supervised learning algorithms. Remember that supervised learning is used whenever we want to predict a certain outcome from a given input, and we have examples of input/output pairs. We build a machine learning model from these input/output pairs, which comprise our training set. Our goal is to make accurate predictions for new, never-before-seen data. Supervised learning often requires human effort to build the training set, but afterwards automates and often speeds up an otherwise laborious or infeasible task. There are two major types of supervised machine learning problems, called classification and regression. In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities.
Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods are known as non-generalizing machine learning methods, since they simply "remember" all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree.).
His research interests are in the area of large scale and online machine learning algorithms. He develops infinitely scalable machine learning algorithms for Amazon SageMaker. Amir Sadoughi is a Senior Software Development Engineer on the AWS AI SageMaker Algorithms team. He is passionate about technologies at the intersection of distributed systems and machine learning.
Things are about to get a little… wiggly. In contrast to the methods we've covered so far -- linear regression, logistic regression, and SVMs where the form of the model was pre-defined -- non-parametric learners do not have a model structure specified a priori. We don't speculate about the form of the function f that we're trying to learn before training the model, as we did previously with linear regression. Instead, the model structure is purely determined from the data. These models are more flexible to the shape of the training data, but this sometimes comes at the cost of interpretability.
Most of the machine learning algorithms are parametric. What do we mean by parametric? Let's say if we are trying to model an linear regression model with one dependent variable and one independent variable. The best fit we are looking is the line equations with optimized parameters. The parameters could be the intercept and coefficient. For any classification algorithm, we will try to get a boundary.