Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means
la Tour, Max Dupré, Saulpic, David
–arXiv.org Artificial Intelligence
The k-means objective function was introduced by Lloyd in 1957 (and published later in [Llo82]) as a measure of the quality of compression. Given a set of points P and an integer k, minimizing the k-means objective yields a set of k centers that provide a good compressed representation of the original dataset P. Lloyd's original motivation was to compress analog audio signals into numerical ones: numerical signals have to be discrete, and Lloyd proposed a method to find which frequencies should be kept in the discretization. His method was a heuristic trying to minimize what he called the quantization error, which is the sum, for each point, of the squared distance to its representative. This is precisely the k-means cost, and the goal of the k-means problem is to find the set of k representatives (or centers) that minimizes this cost. In contrast, the k-median cost function is the sum, for each point, of the distance to its closest center, inherently giving less weight to the outliers in the dataset.
arXiv.org Artificial Intelligence
Jul-15-2024
- Country:
- North America > United States > Louisiana (0.14)
- Genre:
- Research Report (0.50)
- Workflow (0.46)
- Technology: