Fast clustering algorithms for massive datasets

Mar-31-2016, 13:06:19 GMT–#artificialintelligence

You gather tons of keywords over the Internet with a web crawler (crawling Wikipedia or DMOZ directories), and compute the frequencies for each keyword, and for each "keyword pair". A "keyword pair" is two keywords found on a same web page, or close to each other on a same web page. Also by keyword, I mean stuff like "California insurance", so a keyword usually contains more than one token, but rarely more than three. With all the frequencies, you can create a table (typically containing many million keywords, even after keyword cleaning), where each entry is a pair of keywords and 3 numbers, e.g.

artificial intelligence, data mining, machine learning, (5 more...)

#artificialintelligence

Mar-31-2016, 13:06:19 GMT

News Web Page

Add feedback

Country:
- North America > United States > California (0.33)

Technology:
- Information Technology
  - Communications (1.00)
  - Data Science > Data Mining (0.79)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found