Goto

Collaborating Authors

 k-clustering problem


Sliding Window Algorithms for k-Clustering Problems

Neural Information Processing Systems

The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we provide simple and practical algorithms that offer stronger performance guarantees than previous results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.


Review for NeurIPS paper: Sliding Window Algorithms for k-Clustering Problems

Neural Information Processing Systems

Summary and Contributions: The paper presents an algorithm for k-clustering in a sliding window streaming model, where k-clustering means the generalization of k-median and k-means to any fixed l_p-norm. The main theoretical result is an algorithm that achieves O(1)-approximation for points in arbitrary metric space and thus includes the prevalent of Euclidean metric, which is also used in the experimental evaluation. This algorithm is for sliding window streaming, where the algorithm repeatedly solves the clustering problem on the w most recent points in the stream (for parameter w). While the minimal requirement is to estimate the cost of a k-clustering, this algorithm also reports k center points. The usual motivation for this model is to allow old data to expire, and analyze only recent data.


Review for NeurIPS paper: Sliding Window Algorithms for k-Clustering Problems

Neural Information Processing Systems

The paper considers k clustering problem in the sliding window streaming model. However, the authors are urged to consider the reviews when preparing the final version including explicitly stating a bound on the constant factor approximation obtained.


Sliding Window Algorithms for k-Clustering Problems

Neural Information Processing Systems

The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest w elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on k -clustering problems such as k -means and k -median. In this setting, we provide simple and practical algorithms that offer stronger performance guarantees than previous results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.


One-Shot Coresets: The Case of k-Clustering

Bachem, Olivier, Lucic, Mario, Lattanzi, Silvio

arXiv.org Machine Learning

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent -- the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? We affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for a large family of k-clustering problems while retaining strong theoretical guarantees.