Mixed Data Clustering Survey and Challenges

Guerard, Guillaume, Djebali, Sonia

arXiv.org Artificial Intelligence 

This paradigm challenges traditional data management and analysis techniques by demanding innovative solutions capable of processing, analyzing, and deriving insights from vast and diverse datasets. In particular, the inclusion of mixed data types, such as numerical and categorical variables, poses significant challenges to conventional methodologies, necessitating the development of novel approaches to effectively leverage the wealth of information available [2]. Traditionally, data handling methods were designed around homogeneous datasets, typically consisting of numerical values. However, the big data paradigm introduces a multitude of data types, including structured, unstructured, and semi-structured data, which demand a departure from traditional approaches. Moreover, the three primary characteristics of big data--volume, velocity, and variety--amplify the complexity of data analysis, requiring scalable and adaptable solutions capable of processing large volumes of data at high speeds while accommodating diverse data formats and structures. These methods for handling mixed data often involve separate analyses of categorical and numerical variables, treating them as distinct entities rather than integrating their interdependencies.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found