Benchmarking of Clustering Validity Measures Revisited

Simpson, Connor, Campello, Ricardo J. G. B., Stojanovski, Elizabeth

Nov-11-2025–arXiv.org Machine Learning

Clustering is an unsupervised learning technique that aims to identify patterns that consist of similar or interrelated observations within data [39, 87]. Many existing clustering algorithms are often categorised into three primary groups [39, 82]: partitioning algorithms such as K-Means [39] and Spectral Clustering [88], hierarchical algorithms such as Single Linkage [39] and HDBSCAN* [7, 8], and soft (fuzzy or probabilistic) algorithms such as Fuzzy c-Means (FCM) [4] and Expectation Maximisation with Gaussian Mixture Models (EM-GMM) [20]. Partitioning clustering algorithms partition data into a given number of k clusters, while hierarchical clustering algorithms produce a sequence of nested partitions with incrementally varying numbers of clusters. Soft clustering algorithms are similar to partitioning techniques except that each data observation is assigned a degree of membership or probability to each cluster, rather than a full assignment to a single cluster. It is worth mentioning that within the aforementioned categories there are clustering algorithms that may not necessarily assign all observations to clusters, due to outlier trimming or noise detection. Two examples of such algorithms are trimmed K-means [14] and the previously mentioned HDBSCAN*, each of which may produce solutions where not all observations are assigned to clusters. Clustering validation or validity is an important step of the clustering process irrespective of the algorithm used [39, 25], as it is crucial to determine the best produced partition(s) and number of clusters within the data [23].

artificial intelligence, machine learning, partition, (18 more...)

arXiv.org Machine Learning

Nov-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Jersey > Hudson County
    - Hoboken (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Denmark > Southern Denmark (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > North Rhine-Westphalia
    - Upper Bavaria > Munich (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found