In the modal approach to clustering, clusters are defined as the local maxima of the underlying probability density function, where the latter can be estimated either non-parametrically or using finite mixture models. Thus, clusters are closely related to certain regions around the density modes, and every cluster corresponds to a bump of the density. The Modal EM algorithm is an iterative procedure that can identify the local maxima of any density function. In this contribution, we propose a fast and efficient Modal EM algorithm to be used when the density function is estimated through a finite mixture of Gaussian distributions with parsimonious component-covariance structures. After describing the procedure, we apply the proposed Modal EM algorithm on both simulated and real data examples, showing its high flexibility in several contexts.

Recently, a number of statistical problems have found an unexpected solution by inspecting them through a "modal point of view". These include classical tasks such as clustering or regression. This has led to a renewed interest in estimation and inference for the mode. This paper offers an extensive survey of the traditional approaches to mode estimation and explores the consequences of applying this modern modal methodology to other, seemingly unrelated, fields.

Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.

The problem of finding groups in data (cluster analysis) has been extensively studied by researchers from the fields of Statistics and Computer Science, among others. However, despite its popularity it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, unlike the situation with other statistical problems as regression or classification, for some of the cluster methodologies it is quite difficult to specify a population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of the usual nonparametric approach to clustering, which understands clusters as regions of high density, by presenting an explicit formulation for the ideal population clustering.

Arias-Castro, Ery, Qiao, Wanli

Up until the 1970's there were two main ways of clustering points in space. One of them, perhaps pioneered by Pearson [44], was to fit a (usually Gaussian) mixture to the data, and that being done, classify each data point -- as well as any other point available at a later date -- according to the most likely component in the mixture. The other one was based on a direct partitioning of the space, most notably by minimization of the average minimum squared distance to a center: the K-means problem, whose computational difficulty led to a number of famous algorithms [22, 31, 36, 37, 39] and likely played a role in motivating the development of hierarchical clustering [21, 25, 54, 63]. In the 1970's, two decidedly nonparametric approaches to clustering were proposed, both based on the topography given by the population density. Of course, in practice, the density is estimated, often by some form of kernel density estimation.