Must-Know: How to determine the most useful number of clusters?
With unsupervised learning, the idea of class attributes and explicit class membership does not exist; in fact, one of the dominant forms of unsupervised learning -- data clustering -- aims to approximate class membership by minimizing interclass instance similarity and maximizing intraclass similarity. We will have a look at 2 particular popular methods for attempting to answer this question: the elbow method and the silhouette method. It should be self-evident that, in order to plot this variance against varying numbers of clusters, varying numbers of clusters must be tested. The silhouette method measures the similarity of an object to its own cluster -- called cohesion -- when compared to other clusters -- called separation.
May-21-2017, 18:45:41 GMT
- Technology: