Goto

Collaborating Authors

 theoretical revisit


Robust k-means: a Theoretical Revisit

Neural Information Processing Systems

Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on "well-structured" datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers. Finally, we show that, with slight modifications, the most general non-asymptotic results for consistency of quadratic k-means remain valid for this robust variant.



Convex Optimization Procedure for Clustering: Theoretical Revisit

Neural Information Processing Systems

In this paper, we present theoretical analysis of SON~--~a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in \cite{ICML2011Hocking_419,SON, Lindsten650707, pelckmans2005convex}. In particular, we show if the samples are drawn from two cubes, each being one cluster, then SON can provably identify the cluster membership provided that the distance between the two cubes is larger than a threshold which (linearly) depends on the size of the cube and the ratio of numbers of samples in each cluster. To the best of our knowledge, this paper is the first to provide a rigorous analysis to understand why and when SON works. We believe this may provide important insights to develop novel convex optimization based algorithms for clustering.


Reviews: Robust k-means: a Theoretical Revisit

Neural Information Processing Systems

In this paper the author studied theoretic properties of the robust k-means (RKM) formulation proposed in [5,23]. They first studied the robustness property, showing that if the f_\lambda function is convex, the one outlier is sufficient to break down the algorithm; and if f_\lambda need not be convex, then two outliers can breakdown the algorithm. On the other hand, under some structural assumptions on the non-outliers, then a non-trivial breakdown point can be established for RKM. The authors then study the consistency issue, generalising consistency results that are known for convex f_lambda to non convex f_\lambda. My main concern of the paper is that the results appear very specific and I am not entirely sure whether they will appeal to a more general audience in machine learning.


Robust k-means: a Theoretical Revisit

Neural Information Processing Systems

Over the last years, many variations of the quadratic k-means clustering procedure have been proposed, all aiming to robustify the performance of the algorithm in the presence of outliers. In general terms, two main approaches have been developed: one based on penalized regularization methods, and one based on trimming functions. In this work, we present a theoretical analysis of the robustness and consistency properties of a variant of the classical quadratic k-means algorithm, the robust k-means, which borrows ideas from outlier detection in regression. We show that two outliers in a dataset are enough to breakdown this clustering procedure. However, if we focus on "well-structured" datasets, then robust k-means can recover the underlying cluster structure in spite of the outliers.


Convex Optimization Procedure for Clustering: Theoretical Revisit

Neural Information Processing Systems

In this paper, we present theoretical analysis of SON -- a convex optimization procedure for clustering using a sum-of-norms (SON) regularization recently proposed in \cite{ICML2011Hocking_419,SON, Lindsten650707, pelckmans2005convex}. In particular, we show if the samples are drawn from two cubes, each being one cluster, then SON can provably identify the cluster membership provided that the distance between the two cubes is larger than a threshold which (linearly) depends on the size of the cube and the ratio of numbers of samples in each cluster. To the best of our knowledge, this paper is the first to provide a rigorous analysis to understand why and when SON works. We believe this may provide important insights to develop novel convex optimization based algorithms for clustering. Papers published at the Neural Information Processing Systems Conference.