Clustering
Probabilistic Abstraction Hierarchies
Many domains are naturally organized in an abstraction hierarchy or taxonomy, where the instances in "nearby" classes in the taxonomy are similar. In this pa- per, we provide a general probabilistic framework for clustering data into a set of classes organized as a taxonomy, where each class is associated with a prob- abilistic model from which the data was generated. The clustering algorithm simultaneously optimizes three things: the assignment of data instances to clus- ters, the models associated with the clusters, and the structure of the abstraction hierarchy. A unique feature of our approach is that it utilizes global optimization algorithms for both of the last two steps, reducing the sensitivity to noise and the propensity to local maxima that are characteristic of algorithms such as hierarchi- cal agglomerative clustering that only take local steps. We provide a theoretical analysis for our algorithm, showing that it converges to a local maximum of the joint likelihood of model and data.
An Iterative Improvement Procedure for Hierarchical Clustering
We describe a procedure which finds a hierarchical clustering by hill- climbing. The cost function we use is a hierarchical extension of the k-means cost; our local moves are tree restructurings and node reorder- ings. We show these can be accomplished efficiently, by exploiting spe- cial properties of squared Euclidean distances and by using techniques from scheduling algorithms.
Clustering with the Connectivity Kernel
Clustering aims at extracting hidden structure in dataset. While the prob- lem of finding compact clusters has been widely studied in the litera- ture, extracting arbitrarily formed elongated structures is considered a much harder problem. In this paper we present a novel clustering algo- rithm which tackles the problem by a two step procedure: first the data are transformed in such a way that elongated structures become compact ones. In a second step, these new objects are clustered by optimizing a compactness-based criterion. The advantages of the method over related approaches are threefold: (i) robustness properties of compactness-based criteria naturally transfer to the problem of extracting elongated struc- tures, leading to a model which is highly robust against outlier objects; (ii) the transformed distances induce a Mercer kernel which allows us to formulate a polynomial approximation scheme to the generally NP- hard clustering problem; (iii) the new method does not contain free kernel parameters in contrast to methods like spectral clustering or mean-shift clustering.
Feature Selection in Clustering Problems
A novel approach to combining clustering and feature selection is pre- sented. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discrimina- tive power of the used partitioning algorithm. On the technical side, we present an efficient optimization algorithm with guaranteed local con- vergence property. The only free parameter of this method is selected by a resampling-based stability analysis. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features.
Learning the k in k-means
Clustering algorithms are useful tools for data mining, compression, probability density es- timation, and many other important tasks. However, most clustering algorithms require the user to specify the number of clusters (called k), and it is not always clear what is the best value for k. Figure 1 shows examples where k has been improperly chosen. Choosing k is often an ad hoc decision based on prior knowledge, assumptions, and practical experience. Choosing k is made more difficult when the data has many dimensions, even when clusters are well-separated. Center-based clustering algorithms (in particular k-means and Gaussian expectation- maximization) usually assume that each cluster adheres to a unimodal distribution, such as Gaussian. With these methods, only one center should be used to model each subset of data that follows a unimodal distribution. If multiple centers are used to describe data drawn from one mode, the centers are a needlessly complex description of the data, and in fact the multiple centers capture the truth about the subset less well than one center. In this paper we present a simple algorithm called G-means that discovers an appropriate k using a statistical test for deciding whether to split a k-means center into two centers.
Pairwise Clustering and Graphical Models
Signi(cid:2)cant progress in clustering has been achieved by algorithms that are based on pairwise af(cid:2)nities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on ef(cid:2)cient eigenvector calcu- lations. However, spectral methods lack a straightforward probabilistic interpretation which makes it dif(cid:2)cult to automatically set parameters us- ing training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model.
Efficient Out-of-Sample Extension of Dominant-Set Clusters
Dominant sets are a new graph-theoretic concept that has proven to be relevant in pairwise data clustering problems, such as image seg- mentation. They generalize the notion of a maximal clique to edge- weighted graphs and have intriguing, non-trivial connections to continu- ous quadratic optimization and spectral-based grouping. We address the problem of grouping out-of-sample examples after the clustering process has taken place. This may serve either to drastically reduce the compu- tational burden associated to the processing of very large data sets, or to efficiently deal with dynamic situations whereby data sets need to be updated continually. We show that the very notion of a dominant set of- fers a simple and efficient way of doing this.
Maximum Margin Clustering
We propose a new method for clustering based on finding maximum mar- gin hyperplanes through data. By reformulating the problem in terms of the implied equivalence relation matrix, we can pose the problem as a convex integer program. Although this still yields a difficult com- putational problem, the hard-clustering constraints can be relaxed to a soft-clustering formulation which can be feasibly solved with a semidef- inite program. Since our clustering technique only depends on the data through the kernel matrix, we can easily achieve nonlinear clusterings in the same manner as spectral clustering. Experimental results show that our maximum margin clustering technique often obtains more accurate results than conventional clustering methods.
Hierarchical Clustering of a Mixture Model
In this paper we propose an efficient algorithm for reducing a large mixture of Gaussians into a smaller mixture while still preserv- ing the component structure of the original model; this is achieved by clustering (grouping) the components. The method minimizes a new, easily computed distance measure between two Gaussian mixtures that can be motivated from a suitable stochastic model and the iterations of the algorithm use only the model parameters, avoiding the need for explicit resampling of datapoints.
Size Regularized Cut for Data Clustering
We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.