Hierarchical Clustering Via Localized Diffusion Folders
David, Gil (Yale University) | Averbuch, Amir (Tel-Aviv University) | Coifman, Ronald R. (Yale University)
Data clustering is a common technique for statistical data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders methodology performs hierarchical clustering of high-dimensional datasets. The diffusion folders are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and folders in a diffusion graph and by defining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry of the data and a localized Markov transition matrix that is used for the next time step in the diffusion process. The result of this clustering method is a bottom-up hierarchical clustering of the data while each level in the hierarchy contains localized diffusion folders of folders from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy connections between distinct points and areas in the graph. The performance of the algorithms is demonstrated on real data and it is compared to existing methods.
Nov-5-2010
- Country:
- Europe > Italy (0.04)
- North America > United States
- Connecticut > New Haven County
- New Haven (0.04)
- California
- Orange County > Irvine (0.04)
- Alameda County > Berkeley (0.04)
- Connecticut > New Haven County
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.05)
- Technology: