A High-Performance External Validity Index for Clustering with a Large Number of Clusters
Karbasian, Mohammad Yasin, Javadi, Ramin
–arXiv.org Artificial Intelligence
This paper introduces the Stable Matching Based Pairing (SMBP) algorithm, a high-performance external validity index for clustering evaluation in large-scale datasets with a large number of clusters. SMBP leverages the stable matching framework to pair clusters across different clustering methods, significantly reducing computational complexity to $O(N^2)$, compared to traditional Maximum Weighted Matching (MWM) with $O(N^3)$ complexity. Through comprehensive evaluations on real-world and synthetic datasets, SMBP demonstrates comparable accuracy to MWM and superior computational efficiency. It is particularly effective for balanced, unbalanced, and large-scale datasets with a large number of clusters, making it a scalable and practical solution for modern clustering tasks. Additionally, SMBP is easily implementable within machine learning frameworks like PyTorch and TensorFlow, offering a robust tool for big data applications. The algorithm is validated through extensive experiments, showcasing its potential as a powerful alternative to existing methods such as Maximum Match Measure (MMM) and Centroid Ratio (CR).
arXiv.org Artificial Intelligence
Sep-22-2024
- Country:
- Asia > Middle East
- Iran > Isfahan Province > Isfahan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Michigan (0.04)
- New York > New York County
- New York City (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Technology: