Monte Carlo approximation certificates for k-means clustering

Oct-2-2017–arXiv.org Machine Learning

Efficient algorithms for $k$-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect $k$-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for $k$-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of $k$-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the $k$-means objective, and being sub-linear, our algorithm is faster than $k$-means++ when the number of data points is large. We illustrate the performance of our algorithm with both numerical experiments and a performance guarantee: If the data points are drawn independently from any mixture of two Gaussians over $\mathbb{R}^m$ with identity covariance, then with probability $1-O(1/m)$, our $\operatorname{poly}(m)$-time algorithm produces a 3-approximation certificate with 99% confidence.

artificial intelligence, certificate, machine learning, (16 more...)

arXiv.org Machine Learning

Oct-2-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.14)
  - Ohio (0.14)
  - California (0.14)

Genre:
- Research Report (0.82)

Industry:
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found