Refined Learning Bounds for Kernel and Approximate k -Means

Neural Information Processing Systems 

Kernel k -means is one of the most popular approaches to clustering and its theoretical properties have been investigated for decades. However, the existing state-of-the-art risk bounds are of order \mathcal{O}(k/\sqrt{n}), which do not match with the stated lower bound \Omega(\sqrt{k/n}) in terms of k, where k is the number of clusters and n is the size of the training set. In this paper, we study the statistical properties of kernel k -means and Nystr\"{o}m-based kernel k -means, and obtain optimal clustering risk bounds, which improve the existing risk bounds. Particularly, based on a refined upper bound of Rademacher complexity [21], we first derive an optimal risk bound of rate \mathcal{O}(\sqrt{k/n}) for empirical risk minimizer (ERM), and further extend it to general cases beyond ERM. Then, we analyze the statistical effect of computational approximations of Nystr\"{o}m kernel k -means, and prove that it achieves the same statistical accuracy as the original kernel k -means considering only \Omega(\sqrt{nk}) Nystr\"{o}m landmark points. We further relax the restriction of landmark points from \Omega(\sqrt{nk}) to \Omega(\sqrt{n}) under a mild condition.