A Simple Approach to Sparse Clustering

Sep-11-2016–arXiv.org Machine Learning

Consider the problem of sparse clustering, where it is assumed that only a subset of the features are useful for clustering purposes. In the framework of the COSA method of Friedman and Meulman, subsequently improved in the form of the Sparse K-means method of Witten and Tibshirani, a natural and simpler hill-climbing approach is introduced. The new method is shown to be competitive with these two methods and others. Keywords: Sparse Clustering, Hill-climbing, High-dimensional, Feature Selection 1. Introduction Consider a typical setting for clusteringn items based on pairwise dissimilarities, withδ(i,j) denoting the dissimilarity between itemsi,j [n ] {1,...,n } . For concreteness, we assume thatδ(i,j) 0 and δ(i,i) 0 for all i,j [n ] . In principle, if we want to delineateκ clusters, the goal is (for example) to minimize the average within-cluster dissimilarity. Let C n κ denote the class of clusterings ofn items intoκ groups. For C C n κ, its average within-cluster dissimilarity is defined as [C ] k [κ ] 1 C 1 (k) i,j C 1 (k)δ(i,j). If under the Euclidean setting, we further define cluster centers µ k 1 n i C 1 (k)x i with k [κ ], (2) then the within-cluster dissimilarity can be rewritten as follows, [C ] k [κ ] 1 C 1 (k) i,j C 1 (k) x i x j 2 k [κ ] i C 1 (k) x i µ k 2 . The resulting optimization problem is the following: Given (δ(i,j) i,j [n ]), minimize [C ] over C C n κ .

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

Sep-11-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (0.47)
  - Therapeutic Area > Oncology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found