A Scalable Framework for Sparse Clustering Without Shrinkage
Zhang, Zhiyue, Lange, Kenneth, Xu, Jason
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions via simulated experiments and benchmark datasets, as well as a case study on mouse protein expression.
Feb-19-2020
- Country:
- North America > United States > California > Los Angeles County
- Long Beach (0.04)
- Los Angeles (0.14)
- North America > United States > California > Los Angeles County
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (0.88)
- Technology: