A Fuzzy Clustering Algorithm for the Mode Seeking Framework

Bonis, Thomas, Oudot, Steve

arXiv.org Machine Learning 

The analysis of large and possibly high-dimensional datasets is becoming ubiquitous in the sciences. The long-term objective is to gain insight into the structure of measurement or simulation data, for a better understanding of the underlying physical phenomena at work. Clustering is one of the simplest ways of gaining such insight, by finding a suitable decomposition of the data into clusters such that data points within a same cluster share common (and, if possible, exclusive) properties. In this work, we are interested in the mode seeking approach to clustering. This approach assumes the data points to be drawn from some unknown probability distribution and defines the clusters as the basins of attraction of the maxima of the density, requiring a preliminary density estimation phase [7, 5, 10, 11, 13, 15]. The theoretical analysis of this clustering framework has drawn increasing attention recently, see 1 [6, 3, 9, 8, 2]. However, this (hard) clustering method provides a fairly limited knowledge on the structure of the data: while the partition into clusters is well understood, the interplay between clusters (respective locations, proximity relations, interactions) remains unknown. Identifying interfaces between clusters is the first step towards a higher-level understanding of the data, and it already plays a prominent role in some applications such as the study of the conformations space of a protein, where a fundamental question beyond the detection of metastable states is to understand when and how the protein can switch from one metastable state to another [12]. Hard clustering can be used in this context, for instance by defining the border between two clusters as the set of data points whose neighborhood (in the ambient space or in some neighborhood graph) intersects the two clusters, however this kind of information is by nature unstable with respect to perturbations of the data.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found