Flexible Bivariate Beta Mixture Model: A Probabilistic Approach for Clustering Complex Data Structures

Hsu, Yung-Peng, Chen, Hung-Hsuan

arXiv.org Artificial Intelligence 

This unsupervised learning method is widely used in various applications, including image analysis, information retrieval, text analysis, bioinformatics, and many more [1, 2, 3, 4]. Clustering helps uncover the underlying structure of the data, facilitates data summarization, and sometimes serves as a preprocessing step for other algorithms [2]. Despite its widespread use, one of the primary challenges many traditional clustering algorithms face is that they often assume that the data points form clusters with convex shapes. For example, centroid-based algorithms like k -means and distribution-based models like Gaussian Mixture Models (GMM) typically produce clusters that are hyperspherical or ellipsoidal [5]. Although this assumption simplifies the clustering process, it restricts the flexibility of these models to handle complex data distributions that do not conform to convex shapes.