A Scalable Framework for Sparse Clustering Without Shrinkage

Zhang, Zhiyue, Lange, Kenneth, Xu, Jason

Feb-19-2020–arXiv.org Machine Learning

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions via simulated experiments and benchmark datasets, as well as a case study on mouse protein expression.

algorithm, informative feature, skfr, (15 more...)

arXiv.org Machine Learning

Feb-19-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County
  - Los Angeles (0.14)
  - Long Beach (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (0.88)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found