Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

Sambasivan, Rajiv, Das, Sourish

arXiv.org Machine Learning 

Datasets with a mixture of categorical and numerical attributes are pervasive in applications from business and socioeconomic settings. Clustering these datasets is an important activity in their analysis. Techniques to cluster these datasets have been developed by researchers, see for example [1], [2] and [3]. Techniques to cluster mixed datasets either prescribe a probabilistic generative model [4] or use a dissimilarity measure [5] to compute a dissimilarity matrix that is then clustered. Each of these approaches have issues that need to be addressed when they are applied to big datasets - datasets with a large number of instances compared to attributes.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found