Fast Distributed k-Center Clustering with Outliers on Massive Data

Malkomes, Gustavo, Kusner, Matt J., Chen, Wenlin, Weinberger, Kilian Q., Moseley, Benjamin

Feb-14-2020, 08:26:09 GMT–Neural Information Processing Systems

Clustering large data is a fundamental problem with a vast number of applications. Due to the increasing size of data, practitioners interested in clustering have turned to distributed computation methods. In this work, we consider the widely used k-center clustering problem and its variant used to handle noisy data, k-center with outliers. In the noise-free setting we demonstrate how a previously-proposed distributed method is actually an O(1)-approximation algorithm, which accurately explains its strong empirical performance. Additionally, in the noisy setting, we develop a novel distributed algorithm that is also an O(1)-approximation.

algorithm, k-center clustering, massive data, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 08:26:09 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)