An Effective Semi-supervised Divisive Clustering Algorithm
Diverse experimental data ranging from microarray gene expression data in biology to spectrum data in astronomy require to be clustered to signal meaningful correlation of the data. Massive documents or images on internet are also needed to be effectively organized so as to promote the efficiency of search engines. Clustering method as K-means (1) is popular for its simplicity, yet sensitive to noise and initialization and thus is limited by the lack of reliability. Hierarchical clustering (HC) (2) is simple and intuitive and thus widely used especially in biology (3), whereas it needs a large computation (4) and its result is variable to a set of similarity measures between clusters. Moreover, the cluster number for the above methods needs to be prespecified (e.g., K-means) or determined by a threshold (e.g., HC). Some other well-known algorithms either involve complex optimization and postprocessing (5), or have limited range of applications such as the distribution (6) or the attribute of data (7, 8). Although affinity propagation (AP) (9) has much better performance than K-means and the cluster number is determined automatically, it is not good at detecting nonspherical clusters (10). Recently, two effective clustering algorithms (10, 11) were proposed, which can together form a pool of clustering methods based on the in-tree structure (11). But they involve a free parameter.
Jan-6-2015
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report (0.40)
- Industry:
- Technology: