Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means

la Tour, Max Dupré, Saulpic, David

arXiv.org Artificial Intelligence 

The k-means objective function was introduced by Lloyd in 1957 (and published later in [Llo82]) as a measure of the quality of compression. Given a set of points P and an integer k, minimizing the k-means objective yields a set of k centers that provide a good compressed representation of the original dataset P. Lloyd's original motivation was to compress analog audio signals into numerical ones: numerical signals have to be discrete, and Lloyd proposed a method to find which frequencies should be kept in the discretization. His method was a heuristic trying to minimize what he called the quantization error, which is the sum, for each point, of the squared distance to its representative. This is precisely the k-means cost, and the goal of the k-means problem is to find the set of k representatives (or centers) that minimizes this cost. In contrast, the k-median cost function is the sum, for each point, of the distance to its closest center, inherently giving less weight to the outliers in the dataset.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found