On Generalization Bounds for Projective Clustering

Neural Information Processing Systems 

Given a set of points, clustering consists of finding a partition of a point set into k clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous k -median and k -means objectives. One may also choose centers to be j dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of n samples P drawn independently from some unknown, but fixed distribution \mathcal{D}, how quickly does a solution computed on P converge to the optimal clustering of \mathcal{D}?We give several near optimal results.