Accelerating the k-means++ Algorithm by Using Geometric Information
Corominas, Guillem Rodríguez, Blesa, Maria J., Blum, Christian
–arXiv.org Artificial Intelligence
The k-means clustering is a widely used method in data clustering and unsupervised machine learning, aiming to divide a given dataset into k distinct, non-overlapping clusters. This division seeks to minimize the within-cluster variance. The k-means clustering problem becomes NP-hard when extended beyond a single dimension [3]. Despite this complexity, there are algorithms designed to find sufficiently good solutions within a reasonable amount of time. Among these, Lloyd's algorithm, also referred to as the standard algorithm or batch k-means, is the most renowned [42]. The k-means algorithm is one of the most popular algorithms in data mining [58, 32], mainly due to its simplicity, scalability, and guaranteed termination. However, its performance is highly sensible to the initial placement of the centers [5]. In fact, there is no general approximation expectation for Lloyd's algorithm that applies to all scenarios, i.e., an arbitrary initialization may lead to an arbitrarily bad clustering. Therefore, it is crucial to employ effective initialization methods [24].
arXiv.org Artificial Intelligence
Aug-23-2024
- Country:
- Europe
- Germany > Baden-Württemberg
- Karlsruhe Region > Karlsruhe (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland (0.04)
- Germany > Baden-Württemberg
- North America > Canada
- South America > Chile
- Europe
- Genre:
- Research Report (1.00)
- Industry:
- Energy (0.46)
- Technology: