Reviews: k-Means Clustering of Lines for Big Data

Jan-24-2025, 04:36:11 GMT–Neural Information Processing Systems

The authors consider the problem of clustering a set of lines in R d. The goal is to minimize the k-means objective: given n lines L in R d find the best set of k points c1,...,ck in R d so as to minimize sum_{l in L} min_{ci} dist(ci, l) 2. This a clean, nicely motivated problem. The authors provide a coreset construction (namely a small size summary of the input so that any alpha-approximation for the summary yields an alpha(1 epsilon)-approximation for the entire input). This implies the first (1 epsilon)-approximation for the problem with running time nd exp(poly(k)) together with a streaming algorithm with similar running time and memory size 2 {poly(k)} log n. En route to the result the authors provide a bicriteria approximation algorithms: namely a solution that contains O(k (log n dk log k)) centers and whose cost is at most 4 times the cost of the optimal solution with k centers.

algorithm, big data, k-means clustering, (3 more...)

Neural Information Processing Systems

Jan-24-2025, 04:36:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.52)