Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering

He, Zengyou

arXiv.org Artificial Intelligence 

The k -modes algorithm [1] extends the k -means paradigm to cluster categorical data by using (1) a simple matching dissimilarity measure for categorical objects, (2) modes instead of means for clusters, and (3) a frequency-based method to update modes in the k -means fashion to minimize the cost function of clustering. Because the k -modes algorithm uses the same clustering process as k -means, it preserves the efficiency of the k -means algorithm. Although the k -modes algorithm is very efficient, it suffers the problem that the clustering results are sensitive to the selection of the initial points. Hence, a better initial points selection procedure would improve the reliability and accuracy of clustering results. To that end, an iterative initial-points refinement algorithm for k -modes clustering has been presented in [2]. As shown in [2], the new initialization pr ocedure greatly improves the reliability and accuracy of final clustering results. Despite the su ccess of Ref. [2], the following observations motivate us to further pursue other alternative initialization methods.