Reviews: k-Means Clustering of Lines for Big Data
–Neural Information Processing Systems
The authors consider the problem of clustering a set of lines in R d. The goal is to minimize the k-means objective: given n lines L in R d find the best set of k points c1,...,ck in R d so as to minimize sum_{l in L} min_{ci} dist(ci, l) 2. This a clean, nicely motivated problem. The authors provide a coreset construction (namely a small size summary of the input so that any alpha-approximation for the summary yields an alpha(1 epsilon)-approximation for the entire input). This implies the first (1 epsilon)-approximation for the problem with running time nd exp(poly(k)) together with a streaming algorithm with similar running time and memory size 2 {poly(k)} log n. En route to the result the authors provide a bicriteria approximation algorithms: namely a solution that contains O(k (log n dk log k)) centers and whose cost is at most 4 times the cost of the optimal solution with k centers.
Neural Information Processing Systems
Jan-24-2025, 04:36:11 GMT
- Technology: