Robust k-means: a Theoretical Revisit

Apr-10-2023, 09:27:02 GMT–Neural Information Processing Systems

Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on "well-structured" datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.

dataset, outlier, rkm, (15 more...)

Neural Information Processing Systems

Apr-10-2023, 09:27:02 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe
  - Greece (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)