A novel k-means clustering approach using two distance measures for Gaussian data

Nov-25-2025–arXiv.org Machine Learning

Clustering algorithms have long been the topic of research, representing the more popular side of unsupervised learning. Since clustering analysis is one of the best ways to find some clarity and structure within raw data, this paper explores a novel approach to \textit{k}-means clustering. Here we present a \textit{k}-means clustering algorithm that takes both the within cluster distance (WCD) and the inter cluster distance (ICD) as the distance metric to cluster the data into \emph{k} clusters pre-determined by the Calinski-Harabasz criterion in order to provide a more robust output for the clustering analysis. The idea with this approach is that by including both the measurement metrics, the convergence of the data into their clusters becomes solidified and more robust. We run the algorithm with some synthetically produced data and also some benchmark data sets obtained from the UCI repository. The results show that the convergence of the data into their respective clusters is more accurate by using both WCD and ICD measurement metrics. The algorithm is also better at clustering the outliers into their true clusters as opposed to the traditional \textit{k} means method. We also address some interesting possible research topics that reveal themselves as we answer the questions we initially set out to address.

algorithm, cluster center, variance, (13 more...)

arXiv.org Machine Learning

Nov-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Italy (0.04)
- North America > United States
  - Wisconsin (0.04)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found