Linear time small coresets for k-mean clustering of segments with applications
Denisov, David, Dolev, Shlomi, Felmdan, Dan, Segal, Michael
–arXiv.org Artificial Intelligence
We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum_{S \in \mathcal{S}} \min_{x \in X} D(S,x)$, where $D(S,x) := \int_{p \in S} |p - x| dp$ measures the total distance from each point along a segment to a center. Variants of this problem include handling outliers, employing alternative distance functions such as M-estimators, weighting distances to achieve balanced clustering, or enforcing unique cluster assignments. For any $\varepsilon > 0$, an $\varepsilon$-coreset is a weighted subset $C \subseteq \mathbb{R}^d$ that approximates $D(\mathcal{S},X)$ within a factor of $1 \pm \varepsilon$ for any set of $k$ centers, enabling efficient streaming, distributed, or parallel computation. We propose the first coreset construction that provably handles arbitrary input segments. For constant $k$ and $\varepsilon$, it produces a coreset of size $O(\log^2 n)$ computable in $O(nd)$ time. Experiments, including a real-time video tracking application, demonstrate substantial speedups with minimal loss in clustering accuracy, confirming both the practical efficiency and theoretical guarantees of our method.
arXiv.org Artificial Intelligence
Nov-21-2025
- Country:
- Asia
- Brunei (0.04)
- Malaysia (0.04)
- Middle East > Israel
- Haifa District > Haifa (0.04)
- Southern District > Beer-Sheva (0.04)
- Singapore (0.04)
- Europe
- Germany > Lower Saxony
- Gottingen (0.04)
- Italy (0.04)
- Germany > Lower Saxony
- North America > United States
- New York > New York County > New York City (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology (0.46)
- Technology: