Scalable Initialization Methods for Large-Scale Clustering

Hämäläinen, Joonas, Kärkkäinen, Tommi, Rossi, Tuomo

Jul-23-2020–arXiv.org Machine Learning

In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means|| methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Machine Learning

Jul-23-2020

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- North America > United States
  - Texas (0.04)
  - Massachusetts > Plymouth County
    - Norwell (0.04)
  - California
    - Santa Clara County > Palo Alto (0.04)
    - Alameda County > Oakland (0.04)
- Europe > Finland
  - Central Finland > Jyväskylä (0.04)
  - North Karelia > Joensuu (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found