Clustering Based on Graph of Density Topology
Gao, Zhangyang, Lin, Haitao, Li, Stan. Z
Unsupervised clustering is a fundamental problem in machine learning, aimed to classify data points without labels into clusters. Numerous clustering methods including k-means [2, 19], spectral clustering [21, 23], OPTICS [1] and others [5, 7, 10, 14, 18, 21] have been proposed. However, clustering algorithms have been suffering from uneven distribution of data in high level noise, until HDBSCAN is proposed [4, 16]. A key insight of HDBSCAN is based on the density clustering assumption: in an appropriate metric space, data points tend to form clusters in high-density areas whereas noise tends to appear in low-density areas. By dropping noise points and maximizing the stability of clustering, HDBSCAN has made a great advance in classifying samples into clusters. However, HDBSCAN (and other as well) has the following weaknesses: (1) It detects the global topological structure based on the connectivity defined on individual points with its sensitivity to bridge-like noise (seeing Figure 1) between two clusters.
Sep-24-2020