As you might know, supervised machine learning is one of the most commonly used and successful types of machine learning. In this article, we will describe supervised learning in more detail and explain several popular supervised learning algorithms. Remember that supervised learning is used whenever we want to predict a certain outcome from a given input, and we have examples of input/output pairs. We build a machine learning model from these input/output pairs, which comprise our training set. Our goal is to make accurate predictions for new, never-before-seen data. Supervised learning often requires human effort to build the training set, but afterwards automates and often speeds up an otherwise laborious or infeasible task. There are two major types of supervised machine learning problems, called classification and regression. In classification, the goal is to predict a class label, which is a choice from a predefined list of possibilities.
Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels. The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods are known as non-generalizing machine learning methods, since they simply "remember" all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree.).
His research interests are in the area of large scale and online machine learning algorithms. He develops infinitely scalable machine learning algorithms for Amazon SageMaker. Amir Sadoughi is a Senior Software Development Engineer on the AWS AI SageMaker Algorithms team. He is passionate about technologies at the intersection of distributed systems and machine learning.
Most of the machine learning algorithms are parametric. What do we mean by parametric? Let's say if we are trying to model an linear regression model with one dependent variable and one independent variable. The best fit we are looking is the line equations with optimized parameters. The parameters could be the intercept and coefficient. For any classification algorithm, we will try to get a boundary.
K-Nearest Neighbors (K-NN) is one of the simplest machine learning algorithms. Like other machine learning techniques, it was inspired by human reasoning. For example, when something significant happens in your life, you memorize that experience and use it as a guideline for future decisions. Let me give you a scenario of a person dropping a glass. While the glass is falling, you've made the prediction that the glass will break when it hits the ground.