parallelizable model-based approach
A parallelizable model-based approach for marginal and multivariate clustering
de Carvalho, Miguel, Venturini, Gabriel Martos, Svetlošák, Andrej
Context and Motivation Clustering is an unsupervised learning approach for the task of partitioning data into meaningful subsets. The huge literature on cluster analysis is difficult to survey in a few sentences, but a concise description of well-known approaches is offered by Hastie et al. (2009), Everitt et al. (2011), and King (2014). Examples of mainstream methods for clustering data include model-based (Bouveyron et al., 2019), similarity-based (MacQueen, 1967; Kaufman and Rousseeuw, 1987), and hierarchical clustering (Hastie et al., 2009, Section 14.3). In this paper we propose a novel model-based approach for cluster analysis that lies at the interface of model-based clustering (i.e., via mixture models) and similarity-based clustering (i.e., via K-means and K-medoids). The proposed approach aims to benefit from the flexibility and soundness of model-based clustering, while attempting to mitigate Pitfalls 1 and 2 below. Model-based clustering is a fast-evolving and intradisciplinary research topic as can be seen from the recent Handbook on Mixture Analysis (Fruhwirth-Schnatter et al., 2019) as well as the survey papers of Melnykov and Maitra (2010), McNicholas (2016), Gormley et al. (2023), and the references therein.