Stagewise Learning for Sparse Clustering of Discretely-Valued Data
Zhao, Vincent, Zucker, Steven W.
We study the model-based sparse clustering problem for discrete data using a mixture model of product distributions [9, 7]. This model has application in many fields, including computational neurosciences, crowdsourcing and bioinformatics, and is interesting because it differs technically from the problem for continuous data, where the well-known Gaussian mixture model has been applied successfully. A fundamental difficulty is that, in high-dimensional datasets, some features can be noisy, redundant or generally uninformative for clustering, and these can push clustering algorithms toward inappropriate or uninteresting results. If these uninformative or noise data points could be eliminated then, we argue, the results should be much more satisfying. This is precisely our goal: to find an informative set of data points and to use these to drive the clustering.
May-27-2016
- Genre:
- Research Report (0.40)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.69)
- Technology: