Accelerating the k-means++ Algorithm by Using Geometric Information

Corominas, Guillem Rodríguez, Blesa, Maria J., Blum, Christian

Aug-23-2024–arXiv.org Artificial Intelligence

The k-means clustering is a widely used method in data clustering and unsupervised machine learning, aiming to divide a given dataset into k distinct, non-overlapping clusters. This division seeks to minimize the within-cluster variance. The k-means clustering problem becomes NP-hard when extended beyond a single dimension [3]. Despite this complexity, there are algorithms designed to find sufficiently good solutions within a reasonable amount of time. Among these, Lloyd's algorithm, also referred to as the standard algorithm or batch k-means, is the most renowned [42]. The k-means algorithm is one of the most popular algorithms in data mining [58, 32], mainly due to its simplicity, scalability, and guaranteed termination. However, its performance is highly sensible to the initial placement of the centers [5]. In fact, there is no general approximation expectation for Lloyd's algorithm that applies to all scenarios, i.e., an arbitrary initialization may lead to an arbitrarily bad clustering. Therefore, it is crucial to employ effective initialization methods [24].

algorithm, calculation, norm filter, (14 more...)

arXiv.org Artificial Intelligence

Aug-23-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe
  - Switzerland (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Germany > Baden-Württemberg
    - Karlsruhe Region > Karlsruhe (0.04)

Genre:
- Research Report (1.00)

Industry:
- Energy (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found