On the optimality of kernels for high-dimensional clustering
Vankadara, Leena Chennuru, Ghoshdastidar, Debarghya
This paper studies the optimality of kernel methods in high-dimensional data clustering. Recent works have studied the large sample performance of kernel clustering in the high-dimensional regime, where Euclidean distance becomes less informative. However, it is unknown whether popular methods, such as kernel k-means, are optimal in this regime. We consider the problem of high-dimensional Gaussian clustering and show that, with the exponential kernel function, the sufficient conditions for partial recovery of clusters using the NP-hard kernel k-means objective matches the known information-theoretic limit up to a factor of $\sqrt{2}$ for large $k$. It also exactly matches the known upper bounds for the non-kernel setting. We also show that a semi-definite relaxation of the kernel k-means procedure matches up to constant factors, the spectral threshold, below which no polynomial-time algorithm is known to succeed. This is the first work that provides such optimality guarantees for the kernel k-means as well as its convex relaxation. Our proofs demonstrate the utility of the less known polynomial concentration results for random variables with exponentially decaying tails in a higher-order analysis of kernel methods.
Dec-1-2019
- Country:
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia
- Afghanistan > Parwan Province
- Charikar (0.04)
- Japan > Honshū
- Kantō > Kanagawa Prefecture (0.04)
- Afghanistan > Parwan Province
- Europe
- Germany
- Baden-Württemberg > Tübingen Region
- Tübingen (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Baden-Württemberg > Tübingen Region
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany
- South America > Brazil
- São Paulo (0.04)
- Africa > Middle East
- Genre:
- Research Report (1.00)
- Technology: