Random Projections for k -means Clustering

Neural Information Processing Systems 

We prove that any set of n points in d dimensions (rows in a matrix A \in \RR {n \times d}) can be projected into t \Omega(k / \eps 2) dimensions, for any \eps \in (0,1/3), in O(n d \lceil \eps {-2} k/ \log(d) \rceil) time, such that with constant probability the optimal k -partition of the point set is preserved within a factor of 2 \eps . The projection is done by post-multiplying A with a d \times t random matrix R having entries 1/\sqrt{t} or -1/\sqrt{t} with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results.