Reviews: A Practical Algorithm for Distributed Clustering and Outlier Detection

Oct-8-2024, 10:26:57 GMT–Neural Information Processing Systems

The paper addresses the problem of performing the distributed k-mean/median clustering in the presence of outliers, and at the same time identifying the outliers. Data are partitioned across multiple sites either adversarially or randomly, and the sites and a central coordinator work jointly by communications to get the data clustering and the outliers. The authors proposed a practical algorithm with bounded running time O(max{k,\log n} n), and bounded communication cost O(s(k\log n t)) and O(sk\log n t) for adversarial and random data partitioning respectively, for a dataset with n data points, k centers, t outliers, and partitioned across s sites. They used a traditional two-level clustering framework (Guha et al. 2017). If using a \gamma -approximation algorithm for (k,t)-mean/median as the second-level clustering algorithm, their distributed algorithm has a bounded O(\gamma) approximation factor. Extensive experimental studies were conducted to compare the performance of their algorithm with three baseline algorithms.

artificial intelligence, data mining, machine learning, (10 more...)

Neural Information Processing Systems

Oct-8-2024, 10:26:57 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (0.40)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning > Clustering (0.58)