By now anyone who reads virtually any trade magazine has been hearing incessantly about how machine learning is going to transform their industry in profound ways. Marketers will be able to read potential customers' minds, farms will produce unprecedented yields, doctors will be able to stem diseases before they begin to form. And of course, we've all heard how machine learning will eventually take our jobs. It may very well be said of machine learning that there never have been so many wild predictions made about something which the majority of the public knows so little. So what exactly is machine learning? And what can we reasonably expect in the next ten years? And of course the question that has been plaguing us all: will the machines rise up and destroy us?
Technically speaking, the terms supervised and unsupervised learning refer to whether the raw data used to create algorithms has been prelabeled or not. In supervised learning, data scientists feed algorithms with labeled training data and define the variables they want the algorithm to assess for correlations. Both the input and the output of the algorithm is specified in the training data. For example, if you are trying to train an algorithm to infer if a picture has a cat in it using supervised learning, data scientists create a label for each picture used in the training data indicating whether the image contains a cat or not. In an unsupervised learning approach, the algorithm is trained on unlabeled data.
If you are venturing into machine learning, you should know about supervised and unsupervised machine learning. People often find it difficult to draw a line of difference between these two. Apparently, both the learning processes use the same procedure. This further makes it complicated for the learner to differentiate between supervised and unsupervised machine learning. Here, you will come to know the differences between these two types of machine learning.
Clustering is sometimes called "unsupervised classification", a term that I have mixed feelings on for reasons I will cover shortly, but it provides a good enough explanation of the problem to be worth covering. First, the problem is unsupervised -- we won't have a labeled dataset to guide our logic. Secondly we are looking to separate items into classes based on the predictors (technically they are not predictors they are "features" here because there is no response). The difference is that in supervised classification the class structure is known and labeled, whereas in clustering we are inventing the class structure from the feature values alone. In supervised classification we used the labels to single out one class and looked for predictors that had two qualities: 1) They had fairly common values for every example of that class and 2) they separated that class from others.
The one point that I want to emphasize here is that the adjective "unsupervised" does not mean that these algorithms run by themselves without human supervision. It simply indicates the absence of a desired or ideal output corresponding to each input. An analyst (or a data scientist) who is training an unsupervised learning model has to exercise a similar kind of modeling discipline as does the one who is training a supervised model. Alternatively, an analyst who is training an unsupervised learning model can exercise a similar amount of control on the resulting output by configuring model parameters as does the one who is training a supervised model. While supervised algorithms derive a mapping function from x to y so as to accurately estimate the y's corresponding to new x's, unsupervised algorithms employ predefined distance/similarity functions to map the distribution of input x's.