Fast Distributed k-Center Clustering with Outliers on Massive Data

Malkomes, Gustavo, Kusner, Matt J., Chen, Wenlin, Weinberger, Kilian Q., Moseley, Benjamin

Neural Information Processing Systems 

Clustering large data is a fundamental problem with a vast number of applications. Due to the increasing size of data, practitioners interested in clustering have turned to distributed computation methods. In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers. In the noise-free setting we demonstrate how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance. Additionally, in the noisy setting, we develop a novel distributed algorithm that is also an O(1)-approximation.