Clustering Mixed Datasets Using Homogeneity Analysis with Applications to Big Data

Oct-30-2017–arXiv.org Machine Learning

Datasets with a mixture of categorical and numerical attributes are pervasive in applications from business and socioeconomic settings. Clustering these datasets is an important activity in their analysis. Techniques to cluster these datasets have been developed by researchers, see for example [1], [2] and [3]. Techniques to cluster mixed datasets either prescribe a probabilistic generative model [4] or use a dissimilarity measure [5] to compute a dissimilarity matrix that is then clustered. Each of these approaches have issues that need to be addressed when they are applied to big datasets - datasets with a large number of instances compared to attributes.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

Oct-30-2017

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.28)

Genre:
- Research Report (1.00)

Industry:
- Consumer Products & Services > Travel (0.68)
- Transportation
  - Passenger (0.47)
  - Air (0.47)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found