Fast clustering algorithms for massive datasets
How do you represent these keywords, with their cluster structure determined by d(A, B), in a nice graph? 10 million keywords would fit in a 3,000 x 3,000 pixels image. For those interested in graphical representations, see the Fruchterman and Rheingold algorithm, extensively used to produce graphs similar to the one below. Note that its computational complexity is O(n 3) though, so we need to very significantly improve it for this keyword clustering application - including the graphical representation. The graphical representation could be a raster image with millions of pixels, like a heat map where color represents category and, when you point to a pixel, a keyword value shows up (rather than a vector image with dozens of nodes, see graph below). Neighboring pixels would represent strongly related keywords.
Nov-14-2016, 04:33:39 GMT
- Technology: