Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data
Sambasivan, Rajiv, Das, Sourish
Datasets with a mixture of categorical and numerical attributes are pervasive in applications from business and socioeconomic settings. Clustering these datasets is an important activity in their analysis. Techniques to cluster these datasets have been developed by researchers, see for example [1], [2] and [3]. Techniques to cluster mixed datasets either prescribe a probabilistic generative model [4] or use a dissimilarity measure [5] to compute a dissimilarity matrix that is then clustered. Each of these approaches have issues that need to be addressed when they are applied to big datasets - datasets with a large number of instances compared to attributes.
Oct-30-2017
- Country:
- Asia
- India > Tamil Nadu
- Chennai (0.04)
- Singapore (0.04)
- India > Tamil Nadu
- Europe
- Austria > Vienna (0.14)
- Netherlands > North Holland
- Amsterdam (0.04)
- North America > United States (0.14)
- Asia
- Genre:
- Research Report (1.00)
- Industry:
- Technology: