Simple, Scalable and Effective Clustering via One-Dimensional Projections Monika Henzinger Stanford University Institute of Science and Technology Austria (ISTA) Stanford, CA

Neural Information Processing Systems 

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and k-means++ can take Ω(ndk) time when clustering n points in a d-dimensional space (represented by an n d matrix X) into k clusters. In applications with moderate to large k, the multiplicative k factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time O(nnz(X) + n log n) for arbitrary k. Here nnz(X) is the total number of non-zero entries in the input dataset X, which is upper bounded by nd and can be significantly smaller for sparse datasets.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found