Goto

Collaborating Authors

 k-means solution


Reviews: Query K-means Clustering and the Double Dixie Cup Problem

Neural Information Processing Systems

This paper investigates the problem of active-semi-supervised clustering, by considering both noiseless (perfect oracle) and noisy (imperfect oracle) query responses. The authors provide probabilistic guarantees for low approximation errors to the true optimal k-means objective. The corresponding query complexities are substantially lower than in the existing literature. Importantly, as noted by the authors, their query complexity is independent of the size of the dataset. The main strength of the paper lies in the considerable technical rigour with which the subject has been handled.


Breathing $k$-Means

arXiv.org Machine Learning

We propose a new algorithm for the $k$-means problem which repeatedly increases and decreases the number of centroids by $m$ in order to find an approximate solution. New centroids are inserted in areas where they will likely reduce the error. The subsequent removal of centroids is done such that the resulting raise in error is small. After each increase or decrease step standard $k$-means is performed. Termination is guaranteed by decrementing $m$ after each increase/decrease cycle unless the overall error was lowered. In experiments with Gaussian mixture distributions the new algorithm produced on average solutions several percent better than $k$-means++.