Reviews: A Practical Algorithm for Distributed Clustering and Outlier Detection
–Neural Information Processing Systems
The paper addresses the problem of performing the distributed k-mean/median clustering in the presence of outliers, and at the same time identifying the outliers. Data are partitioned across multiple sites either adversarially or randomly, and the sites and a central coordinator work jointly by communications to get the data clustering and the outliers. The authors proposed a practical algorithm with bounded running time O(max{k,\log n} n), and bounded communication cost O(s(k\log n t)) and O(sk\log n t) for adversarial and random data partitioning respectively, for a dataset with n data points, k centers, t outliers, and partitioned across s sites. They used a traditional two-level clustering framework (Guha et al. 2017). If using a \gamma -approximation algorithm for (k,t)-mean/median as the second-level clustering algorithm, their distributed algorithm has a bounded O(\gamma) approximation factor. Extensive experimental studies were conducted to compare the performance of their algorithm with three baseline algorithms.
Neural Information Processing Systems
Oct-8-2024, 10:26:57 GMT
- Technology: