A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

Hess, Tom, Moshkovitz, Michal, Sabato, Sivan

Feb-8-2021–arXiv.org Machine Learning

Clustering is a fundamental unsupervised learning task used for various applications, such as anomaly detection (Leung and Leckie, 2005), recommender systems (Shepitsen et al., 2008) and cancer diagnosis (Zheng et al., 2014). In recent years, research on sequential clustering has been actively studied, motivated by applications in which data arrives sequentially, such as online recommender systems (Nasraoui et al., 2007) and online community detection (Aggarwal, 2003). In this work, we study k-median clustering in the sequential no-substitution setting, a term first introduced in Hess and Sabato (2020). In this setting, a stream of data points is sequentially observed, and some of these points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. This setting is motivated by applications in which center selection is mapped to a real-world irreversible action, such as providing users with promotional gifts or recruiting participants to a clinical trial. The goal in the no-substitution k-median setting is to obtain a near-optimal k-median risk value, while selecting a number of centers that is as close as possible to k.

algorithm, estselect, optimal cluster, (14 more...)

arXiv.org Machine Learning

Feb-8-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Asia
  - Middle East > Israel
    - Southern District > Beer-Sheva (0.04)
  - Afghanistan > Parwan Province
    - Charikar (0.04)

Genre:
- Research Report (0.90)

Industry:
- Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning > Personal Assistant Systems (0.68)
    - Machine Learning > Statistical Learning
      - Clustering (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found