Simple, Scalable and Effective Clustering via One-Dimensional Projections Monika Henzinger Stanford University Institute of Science and Technology Austria (ISTA) Stanford, CA
–Neural Information Processing Systems
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and k-means++ can take Ω(ndk) time when clustering n points in a d-dimensional space (represented by an n d matrix X) into k clusters. In applications with moderate to large k, the multiplicative k factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time O(nnz(X) + n log n) for arbitrary k. Here nnz(X) is the total number of non-zero entries in the input dataset X, which is upper bounded by nd and can be significantly smaller for sparse datasets.
Neural Information Processing Systems
Feb-11-2025, 11:51:08 GMT
- Country:
- Europe (1.00)
- North America > United States
- California > Santa Clara County > Stanford (0.40)
- Technology: