Clustering is sometimes called "unsupervised classification", a term that I have mixed feelings on for reasons I will cover shortly, but it provides a good enough explanation of the problem to be worth covering. First, the problem is unsupervised -- we won't have a labeled dataset to guide our logic. Secondly we are looking to separate items into classes based on the predictors (technically they are not predictors they are "features" here because there is no response). The difference is that in supervised classification the class structure is known and labeled, whereas in clustering we are inventing the class structure from the feature values alone. In supervised classification we used the labels to single out one class and looked for predictors that had two qualities: 1) They had fairly common values for every example of that class and 2) they separated that class from others.
Oct-28-2019, 23:38:31 GMT