Central Clustering of Categorical Data with Automated Feature Weighting

Chen, Lifei (Fujian Normal University) | Wang, Shengrui (University of Sherbrooke)

AAAI Conferences 

The ability to cluster high-dimensional categorical data is essential for many machine learning applications such as bioinfomatics. Currently, central clustering of categorical data is a difficult problem due to the lack of a geometrically interpretable definition of a cluster center. In this paper, we propose a novel kernel-density-based definition using a Bayes-type probability estimator. Then, a new algorithm called k-centers is proposed for central clustering of categorical data, incorporating a new feature weighting scheme by which each attribute is automatically assigned with a weight measuring its individual contribution for the clusters. Experimental results on real-world data show outstanding performance of the proposed algorithm, especially in recognizing the biological patterns in DNA sequences.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found