scikit-learn and Game of Thrones - DZone Big Data
In my last post, I showed how to find similar Game of Thrones episodes based on the characters that appear in different episodes. This allowed us to find similar episodes on an episode by episode basis, but I was curious whether there were groups of similar episodes that we could identify. A clustering algorithm groups similar documents together, where similarity is based on calculating a'distance' between documents. Documents separated by a small distance would be in the same cluster, whereas if there's a large distance between episodes then they'd probably be in different clusters. The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
Sep-10-2016, 18:00:39 GMT