Simple, Scalable and Effective Clustering via One-Dimensional Projections

Jan-19-2025, 22:23:27 GMT–Neural Information Processing Systems

Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and k -means can take \Omega(ndk) time when clustering n points in a d -dimensional space (represented by an n\times d matrix X) into k clusters. On massive datasets with moderate to large k, the multiplicative k factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time O(\mathsf{nnz}(X) n\log n) for arbitrary k . Here \mathsf{nnz}(X) is the total number of non-zero entries in the input dataset X, which is upper bounded by nd and can be significantly smaller for sparse datasets.

algorithm, one-dimensional projection, scalable and effective clustering, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 22:23:27 GMT

Conferences Web Page

Add feedback

Country:
- Asia > Afghanistan > Parwan Province > Charikar (0.08)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)