Clustering idea for very large datasets

Apr-22-2016, 10:00:27 GMT–@machinelearnbot

Let's say you have to cluster 10 million points, for instance keywords. So, in short, you can perform k-NN (k-nearest neighbors) clustering or some other types of clustering, which typically is O(n 2) or worse, from a computational complexity point of view. Has anyone ever used a clustering method based on sampling? The idea is to start by sampling 1% (or less) of the 100,000,000 entries, and perform clustering on these pairs of keywords, to create a "seed" or "baseline" cluster structure. The next step is to browse sequentially your 10,000,000 keywords, and for each keyword, find the closest cluster from the baseline cluster structure.

artificial intelligence, keyword, machine learning, (7 more...)

@machinelearnbot

Apr-22-2016, 10:00:27 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning
  - Clustering (0.58)
  - Nearest Neighbor Methods (0.58)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found